This bug only affects config server replica set configurations, and cannot occur in 3.0, 2.6 or 2.4 series clusters.
In a sharded cluster with CSRS config servers that is moving some chunk, C from a donor shard to a recipient shard,
If the donor shard replica set primary node (or standalone node) crashes during the chunk migration critical section after writing the chunk metadata changes to the config server,
And some mongos that is not aware of the change to the chunk metadata tries to route a write for the donated chunk to the donor shard,
And the new donor replica set primary node (or restarted standalone node) contacts a lagged CSRS secondary that has stale chunk information,
Then the new donor node will accept the write even though it does not own the chunk, leading to a lost write.
The problem is that the donor replica set does not remember that it is finishing a chunk migration across failovers and restarts, and also does not durably remember the minimum config server optime corresponding to its most recently completed metadata operation.
- related to
-
SERVER-20889 Introduce means to disable sharding minOpTime recovery
- Closed
-
SERVER-21033 Sharding minOpTime info writes should not wait for read concern while holding a lock
- Closed
-
SERVER-20824 Test for sharding state recovery
- Closed