[SERVER-35932] Mongos saved docs in wrong shard Created: 02/Jul/18  Updated: 10/Jul/18  Resolved: 10/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.10
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Pierre Coquentin Assignee: Esha Maharishi (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 14.04


Issue Links:
Duplicate
duplicates SERVER-32198 Missing collection metadata on the sh... Closed
Operating System: ALL
Sprint: Sharding 2018-07-16
Participants:

 Description   

Each month, our system creates new empty sharded collections, then to balance the load, we manually moved newly created chunk to a dedicated shard using the commands:

mongos> sh.moveChunk("sigfox.ShardedCollection_2018_07", { "a" : 1, "b" : 1 }, "sigfoxSet-2");

So we have 2 shards (sigfoxSet and sigfoxSet-2), the first one (the primary shard) contains all collections not sharded and the second one contains all sharded collections. All works perfectly until it doesn't  

This month we ran into a problem when the system starts using one of the sharded collection. The mongos didn't save the docs in the right shard, it saves the docs in the primary shard like if we didn't move the chunks previously, but it was impossible to read them afterward as they were not in the correct shard, the collection on the mongos was empty.

Usually, when the mongos start using a new sharded collection, we see in the logs something like that:

2018-07-02T14:22:07.970+0200 I SHARDING [conn309] Refreshing chunks for collection sigfox.ShardedCollection_2018_07 based on version 2|971||5b10f56dde1fd15066f7b6ff

But this time, nothing in the logs until we detected the problem and forced a restart of the mongos. After the restart, all was working perfectly like nothing happens. 

We dumped directly the docs from the wrong shard by executing the mongodump on the replicatset of the primary shard and restored them using a mongos.
 

mongos> db.ShardedCollection_2018_07.find({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true}).explain()
{
	"queryPlanner" : {
		"mongosPlannerVersion" : 1,
		"winningPlan" : {
			"stage" : "SINGLE_SHARD",
			"shards" : [
				{
					"shardName" : "sigfoxSet-2",
					"connectionString" : "sigfoxSet-2/xxx.xxx.xxx.xxx",
					"serverInfo" : {
						"host" : "mongo-2a",
						"port" : 27017,
						"version" : "3.4.10",
						"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
					},
					"plannerVersion" : 1,
					"namespace" : "sigfox.ShardedCollection_2018_07",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"$and" : [
							{
								"a" : {
									"$eq" : NumberLong(1768362)
								}
							},
							{
								"b" : {
									"$eq" : NumberLong("1530403200000")
								}
							}
						]
					},
					"winningPlan" : {
						"stage" : "PROJECTION",
						"transformBy" : {
							"_id" : true
						},
						"inputStage" : {
							"stage" : "FETCH",
							"inputStage" : {
								"stage" : "SHARDING_FILTER",
								"inputStage" : {
									"stage" : "IXSCAN",
									"keyPattern" : {
										"a" : 1,
										"b" : 1
									},
									"indexName" : "a_1_b_1",
									"isMultiKey" : false,
									"multiKeyPaths" : {
										"a" : [ ],
										"b" : [ ]
									},
									"isUnique" : false,
									"isSparse" : false,
									"isPartial" : false,
									"indexVersion" : 1,
									"direction" : "forward",
									"indexBounds" : {
										"a" : [
											"[1768362, 1768362]"
										],
										"b" : [
											"[1530403200000, 1530403200000]"
										]
									}
								}
							}
						}
					},
					"rejectedPlans" : [...]
				}
			]
		}
	},
	"ok" : 1
}
 
sigfoxSet:SECONDARY> db.ShardedCollection_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true})
{ "_id" : ObjectId("5b381980e541cd4403df3a67") }
 
sigfoxSet-2:SECONDARY> db.DeviceMessage_2018_07.findOne({"a" : NumberLong(1768362), "b" : NumberLong("1530403200000")}, {_id: true})
null

Feel free to ask for additional information if needed.
Thanks

 



 Comments   
Comment by Pierre Coquentin [ 10/Jul/18 ]

Indeed, we had an unexpected election on the primary shard a few days after the move chunk, it fits all the symptoms. So, until we upgrade to the latest version, we have to not forget to restart mongoses if an election is triggered.
Thank you for your clear explanation,

Pierre

Comment by Esha Maharishi (Inactive) [ 09/Jul/18 ]

tldr;

If this is happening infrequently and only after sharding a collection (as opposed to after a regular migration), it may be due to elections on the primary shard triggering the bug (SERVER-32198) that asya mentioned in her earlier comment.

PierreCoquentin, can you check if this issue only occurs if the primary shard had an election around the time the issue manifested?

Note that we fixed this in 4.0 and backported the fix to 3.6.2 under SERVER-32202.

_______________________________

Full explanation:

The fact that this happens on a recurring basis after sharding a collection (i.e., only affects writes after a unsharded -> sharded transition, rather than writes after a migration) suggests it may have to do with the fact that we clear a shard's routing table cache when it steps up to primary or steps down to secondary.

This fits the description, because it means a shard would accept a request from mongos sent with an UNSHARDED shardVersion, since a shard interprets not having a routing table cache entry as meaning that the collection is unsharded. This means the shard would not return a stale shardVersion error, which would have triggered mongos to refresh its routing table and re-route.

It also fits because after sharding the collection, all mongos would be routing requests to the primary shard with the UNSHARDED shardVersion until they hear such a stale shardVersion error from the primary shard, or until they are restarted (because when mongos loads the routing table for a database for the first time, it also loads all the sharded collections).

However, I would expect the mongoses ran the shardCollection and moveChunks to be able to target correctly without needing a restart, since a mongos marks its routing table entry for a collection as stale after sharding a collection and after sending a moveChunk for the collection. If there are many mongos, perhaps it just went unnoticed that writes through the (presumably, single) mongos through which shardCollection and moveChunk were run were actually targeted correctly?

Comment by Pierre Coquentin [ 04/Jul/18 ]

A slight precision I forgot, our application is composed of multiple application nodes, each one has their own mongos, and during the incident, all of them was saving in the wrong shard until I restart them one by one.
To answer your questions:
1) Yes, we manually moved the chunks of the newly created collections before they are used (so basically, there is only one empty chunk) and we have disabled the auto-balancing provided by mongo.
2) The driver used is the one written in Java, so the impacted operations were "save" (which is if I am not mistaken a replace because we provide the id), the "update" (upsert: false) and the "findAndModify" (upsert: false).
3) When trying to access missing documents through the impacted mongos, I was using find and count, both of them returning 0 results. To access the documents, I had to connect directly to the primary shard.

Comment by Esha Maharishi (Inactive) [ 03/Jul/18 ]

kaloian.manassiev, from what I can tell, the write path on 3.4.10 mongos does attach the UNSHARDED version:

However, not all reads are versioned in 3.4.10, particularly if mongos thinks the collection is unsharded. I have not enumerated these here since the issue seems to be with writes.

PierreCoquentin, one thing I noticed you said was:

So we have 2 shards (sigfoxSet and sigfoxSet-2), the first one (the primary shard) contains all collections not sharded and the second one contains all sharded collections. (emphasis mine)

1) Does this mean you are moving all chunks for sharded collections to the second shard? Or are you letting the chunks be balanced evenly across your two shards?

Also, you mentioned

The mongos didn't save the docs in the right shard, it saves the docs in the primary shard like if we didn't move the chunks previously, but it was impossible to read them afterward as they were not in the correct shard, the collection on the mongos was empty.

Two further questions:

2) What kind of write do you mean by "save"? Insert, op-style update with upsert: true, or replacement-style update with upsert: true?

3) When you tried to read the docs, did you do so from the same mongos as you used to write the docs? If so, what kind of read did you use (aggregate, find, count?)

Generated at Thu Feb 08 04:41:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.