[SERVER-29834] only the active moveChunk on a shard will set the last opTime to wait for writeConcern for Created: 23/Jun/17  Updated: 30/Oct/23  Resolved: 18/Jul/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.5, 3.5.9
Fix Version/s: 3.5.11

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-30183 a moveChunk that joins the active mov... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Sharding 2017-07-31
Participants:
Linked BF Score: 0

 Description   

When multiple moveChunk commands "pile up" on a shard, only the first one actually runs, does the range deletion, and sets the last opTime on ReplClientInfo after performing the range deletion.

Therefore, when the other moveChunk commands go to wait for write concern before returning, they are not waiting on an opTime that includes the range deletion deletes.

So, it is possible (as occurred frequently in BF-5452) for a config stepdown to happen during a manual migration initiated through mongos; mongos to retry the manual migration; and the second manual migration to return before the range deletes have actually replicated.

If mongos then performs a secondary read including the donated range (which in v3.4 is unversioned, so will be sent to the donor shard) the read can return duplicate documents (because they have not yet been deleted). This is true even if the moveChunk request had waitForDelete: true and writeConcern: majority, and the read had readConcern: majority.



 Comments   
Comment by Githook User [ 18/Jul/17 ]

Author:

{u'username': u'EshaMaharishi', u'name': u'Esha Maharishi', u'email': u'esha.maharishi@mongodb.com'}

Message: SERVER-29834 make every moveChunk request set the last opTime to wait for writeConcern for
Branch: master
https://github.com/mongodb/mongo/commit/8cd40debb70c269710eebdb5bdfc3e70e0b935f9

Comment by Esha Maharishi (Inactive) [ 23/Jun/17 ]

Though the Safe Secondary Reads project in 3.6 will make secondaries filter results (so documents from a donated range should not be returned, even if they have yet to be deleted), this is still an issue on 3.4.

It's also relatively simple to fix: the call to repl::ReplClientInfo::forClient(opCtx->getClient()).setLastOpToSystemLastOpTime(opCtx); should just happen after the moveChunk threads join, so that all of them wait for writeConcern for the correct opTime.

So, I suggest making the fix on 3.6 as well as 3.4.

Generated at Thu Feb 08 04:21:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.