-
Type: Bug
-
Resolution: Duplicate
-
Priority: Critical - P2
-
None
-
Affects Version/s: 3.4.10
-
Component/s: Sharding
-
None
-
Environment:Ubuntu 14.04
-
ALL
-
Sharding 2018-07-16
Each month, our system creates new empty sharded collections, then to balance the load, we manually moved newly created chunk to a dedicated shard using the commands:
mongos> sh.moveChunk("sigfox.ShardedCollection_2018_07", { "a" : 1, "b" : 1 }, "sigfoxSet-2");
So we have 2 shards (sigfoxSet and sigfoxSet-2), the first one (the primary shard) contains all collections not sharded and the second one contains all sharded collections. All works perfectly until it doesn't
This month we ran into a problem when the system starts using one of the sharded collection. The mongos didn't save the docs in the right shard, it saves the docs in the primary shard like if we didn't move the chunks previously, but it was impossible to read them afterward as they were not in the correct shard, the collection on the mongos was empty.
Usually, when the mongos start using a new sharded collection, we see in the logs something like that:
2018-07-02T14:22:07.970+0200 I SHARDING [conn309] Refreshing chunks for collection sigfox.ShardedCollection_2018_07 based on version 2|971||5b10f56dde1fd15066f7b6ff
But this time, nothing in the logs until we detected the problem and forced a restart of the mongos. After the restart, all was working perfectly like nothing happens.
We dumped directly the docs from the wrong shard by executing the mongodump on the replicatset of the primary shard and restored them using a mongos.
mongos> db.ShardedCollection_2018_07.find({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true}).explain() { "queryPlanner" : { "mongosPlannerVersion" : 1, "winningPlan" : { "stage" : "SINGLE_SHARD", "shards" : [ { "shardName" : "sigfoxSet-2", "connectionString" : "sigfoxSet-2/xxx.xxx.xxx.xxx", "serverInfo" : { "host" : "mongo-2a", "port" : 27017, "version" : "3.4.10", "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9" }, "plannerVersion" : 1, "namespace" : "sigfox.ShardedCollection_2018_07", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "a" : { "$eq" : NumberLong(1768362) } }, { "b" : { "$eq" : NumberLong("1530403200000") } } ] }, "winningPlan" : { "stage" : "PROJECTION", "transformBy" : { "_id" : true }, "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "SHARDING_FILTER", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "a" : 1, "b" : 1 }, "indexName" : "a_1_b_1", "isMultiKey" : false, "multiKeyPaths" : { "a" : [ ], "b" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "a" : [ "[1768362, 1768362]" ], "b" : [ "[1530403200000, 1530403200000]" ] } } } } }, "rejectedPlans" : [...] } ] } }, "ok" : 1 } sigfoxSet:SECONDARY> db.ShardedCollection_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true}) { "_id" : ObjectId("5b381980e541cd4403df3a67") } sigfoxSet-2:SECONDARY> db.DeviceMessage_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true}) null
Feel free to ask for additional information if needed.
Thanks
- duplicates
-
SERVER-32198 Missing collection metadata on the shard implies both UNSHARDED and "metadata not loaded yet"
- Closed