Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35932

Mongos saved docs in wrong shard

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: 3.4.10
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      Ubuntu 14.04
    • Operating System:
      ALL
    • Sprint:
      Sharding 2018-07-16

      Description

      Each month, our system creates new empty sharded collections, then to balance the load, we manually moved newly created chunk to a dedicated shard using the commands:

      mongos> sh.moveChunk("sigfox.ShardedCollection_2018_07", { "a" : 1, "b" : 1 }, "sigfoxSet-2");
      

      So we have 2 shards (sigfoxSet and sigfoxSet-2), the first one (the primary shard) contains all collections not sharded and the second one contains all sharded collections. All works perfectly until it doesn't  

      This month we ran into a problem when the system starts using one of the sharded collection. The mongos didn't save the docs in the right shard, it saves the docs in the primary shard like if we didn't move the chunks previously, but it was impossible to read them afterward as they were not in the correct shard, the collection on the mongos was empty.

      Usually, when the mongos start using a new sharded collection, we see in the logs something like that:

      2018-07-02T14:22:07.970+0200 I SHARDING [conn309] Refreshing chunks for collection sigfox.ShardedCollection_2018_07 based on version 2|971||5b10f56dde1fd15066f7b6ff

      But this time, nothing in the logs until we detected the problem and forced a restart of the mongos. After the restart, all was working perfectly like nothing happens. 

      We dumped directly the docs from the wrong shard by executing the mongodump on the replicatset of the primary shard and restored them using a mongos.
       

      mongos> db.ShardedCollection_2018_07.find({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true}).explain()
      {
      	"queryPlanner" : {
      		"mongosPlannerVersion" : 1,
      		"winningPlan" : {
      			"stage" : "SINGLE_SHARD",
      			"shards" : [
      				{
      					"shardName" : "sigfoxSet-2",
      					"connectionString" : "sigfoxSet-2/xxx.xxx.xxx.xxx",
      					"serverInfo" : {
      						"host" : "mongo-2a",
      						"port" : 27017,
      						"version" : "3.4.10",
      						"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
      					},
      					"plannerVersion" : 1,
      					"namespace" : "sigfox.ShardedCollection_2018_07",
      					"indexFilterSet" : false,
      					"parsedQuery" : {
      						"$and" : [
      							{
      								"a" : {
      									"$eq" : NumberLong(1768362)
      								}
      							},
      							{
      								"b" : {
      									"$eq" : NumberLong("1530403200000")
      								}
      							}
      						]
      					},
      					"winningPlan" : {
      						"stage" : "PROJECTION",
      						"transformBy" : {
      							"_id" : true
      						},
      						"inputStage" : {
      							"stage" : "FETCH",
      							"inputStage" : {
      								"stage" : "SHARDING_FILTER",
      								"inputStage" : {
      									"stage" : "IXSCAN",
      									"keyPattern" : {
      										"a" : 1,
      										"b" : 1
      									},
      									"indexName" : "a_1_b_1",
      									"isMultiKey" : false,
      									"multiKeyPaths" : {
      										"a" : [ ],
      										"b" : [ ]
      									},
      									"isUnique" : false,
      									"isSparse" : false,
      									"isPartial" : false,
      									"indexVersion" : 1,
      									"direction" : "forward",
      									"indexBounds" : {
      										"a" : [
      											"[1768362, 1768362]"
      										],
      										"b" : [
      											"[1530403200000, 1530403200000]"
      										]
      									}
      								}
      							}
      						}
      					},
      					"rejectedPlans" : [...]
      				}
      			]
      		}
      	},
      	"ok" : 1
      }
       
      sigfoxSet:SECONDARY> db.ShardedCollection_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true})
      { "_id" : ObjectId("5b381980e541cd4403df3a67") }
       
      sigfoxSet-2:SECONDARY> db.DeviceMessage_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true})
      null
      

      Feel free to ask for additional information if needed.
      Thanks

       

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: