-
Type:
Bug
-
Resolution: Works as Designed
-
Priority:
Major - P3
-
None
-
Affects Version/s: 3.6.8, 4.0.9
-
Component/s: None
-
None
-
ALL
-
-
Sharding 2019-06-17, Sharding 2019-07-01
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
original description
We had a production incident last week where we had inconsistent results reading from secondaries for a collection, and a secondary wasn't returning a document due to a chunk move and the a stale chunk map. Running flushRouterConfig fixed the issue.
We have a collection that has very little read and write activity, with the vast majority of all reads occur on secondaries. This collection was sharded on the field
{k:1}with the vast majority of find queries being a findOne with
{k: "someValue"}.
What appears to be the case is that when a chunk migration happens, the mongos reloads the collection version only when an action is performed against the primary (read or write). If there is no action against the primary, the mongos will still use the old chunk map. This is problematic when adding shards and rebalancing, which is what we had done.
On a read-only collection or on a collection that rarely receives writes, like ours it is possible the secondary reads to miss documents after a chunk migration to a shard that didn't previously exist on the chunk map.
It appears that if the collection receives a read or write to the primary, the mongos that issued the command will update its chunk map but other mongos will not.
We're running 3.6.8 on MMAP.