When multiple moveChunk commands "pile up" on a shard, only the first one actually runs, does the range deletion, and sets the last opTime on ReplClientInfo after performing the range deletion.
Therefore, when the other moveChunk commands go to wait for write concern before returning, they are not waiting on an opTime that includes the range deletion deletes.
So, it is possible (as occurred frequently in BF-5452) for a config stepdown to happen during a manual migration initiated through mongos; mongos to retry the manual migration; and the second manual migration to return before the range deletes have actually replicated.
If mongos then performs a secondary read including the donated range (which in v3.4 is unversioned, so will be sent to the donor shard) the read can return duplicate documents (because they have not yet been deleted). This is true even if the moveChunk request had waitForDelete: true and writeConcern: majority, and the read had readConcern: majority.
- related to
-
SERVER-30183 a moveChunk that joins the active moveChunk on a shard may not respect its waitForDelete
- Closed