[SERVER-29834] only the active moveChunk on a shard will set the last opTime to wait for writeConcern for Created: 23/Jun/17 Updated: 30/Oct/23 Resolved: 18/Jul/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.5, 3.5.9 |
| Fix Version/s: | 3.5.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v3.4
|
||||||||||||||||
| Sprint: | Sharding 2017-07-31 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
When multiple moveChunk commands "pile up" on a shard, only the first one actually runs, does the range deletion, and sets the last opTime on ReplClientInfo after performing the range deletion. Therefore, when the other moveChunk commands go to wait for write concern before returning, they are not waiting on an opTime that includes the range deletion deletes. So, it is possible (as occurred frequently in BF-5452) for a config stepdown to happen during a manual migration initiated through mongos; mongos to retry the manual migration; and the second manual migration to return before the range deletes have actually replicated. If mongos then performs a secondary read including the donated range (which in v3.4 is unversioned, so will be sent to the donor shard) the read can return duplicate documents (because they have not yet been deleted). This is true even if the moveChunk request had waitForDelete: true and writeConcern: majority, and the read had readConcern: majority. |
| Comments |
| Comment by Githook User [ 18/Jul/17 ] |
|
Author: {u'username': u'EshaMaharishi', u'name': u'Esha Maharishi', u'email': u'esha.maharishi@mongodb.com'}Message: |
| Comment by Esha Maharishi (Inactive) [ 23/Jun/17 ] |
|
Though the Safe Secondary Reads project in 3.6 will make secondaries filter results (so documents from a donated range should not be returned, even if they have yet to be deleted), this is still an issue on 3.4. It's also relatively simple to fix: the call to repl::ReplClientInfo::forClient(opCtx->getClient()).setLastOpToSystemLastOpTime(opCtx); should just happen after the moveChunk threads join, so that all of them wait for writeConcern for the correct opTime. So, I suggest making the fix on 3.6 as well as 3.4. |