Resolution: Done
Major - P3
Affects Version/s: 2.2.2
Component/s: Performance, Sharding, Stability
Environment:Linux Centos5/6
Every morning since last week, all operations to a sharded collection are failing.
Here are application side errors :
setShardVersion failed host: mdbcis4-01-sv.criteo.prod:27021
'. (Response was { "err" : "setShardVersion failed host: mdbcis4-01-sv.criteo.prod:27021
{ oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ns: \"counters.statistics\", version: Timestamp 4000|3, versionEpoch: ObjectId('000000000000000000000000'), globalVersion: Timestamp 6000|0, globalVersionEpoch: ObjectId('000000000000000000000000'), reloadConfig: true, errmsg: \"shard global version for collection is higher than trying to set to 'counters.statistics'\", ok: 0.0 }", "code" : 10429, "n" : 0, "ok" : 1.0 }). : MongoDB.Driver.SafeModeResult SendMessage(MongoDB.Driver.Internal.MongoRequestMessage, MongoDB.Driver.SafeMode)
Server-side ones :
warning: aborted moveChunk because official version less than mine?: official 5|1||000000000000000000000000 mine: 6|0||000000000000000000000000
Restarting mongod unlocks operations until next morning.
In attachment, logs of servers involved in the moveChunk process (shard4 to shard6), sh_status output, changelog collection output
In logs issue starts at Wed Dec 12 06:47:09, ends at Wed Dec 12 09:30:00 after restart.
Link to our MMS dashboard : https://mms.10gen.com/host/list/4f8d732587d1d86fa8b99c12
Problem occured before we added 8th shard and seems to be linked to previous bugs : https://jira.mongodb.org/browse/SERVER-7034 and https://jira.mongodb.org/browse/SERVER-7821
- related to
SERVER-7821 MongoS blocks all requests to sharded collection
- Closed
SERVER-7034 timeouts for all connections in migrate critical section
- Closed