[SERVER-30724] Initial sync might miss ops that were in flight when it started Created: 17/Aug/17 Updated: 30/Oct/23 Resolved: 21/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | 3.0.15, 3.2.16 |
| Fix Version/s: | 3.2.21 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Storage 2017-09-11 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
When initial sync starts, the node reads the last oplog entry "A" from its sync source and uses that as its starting point when it applies operations in a later phase. Because such a read is not subject to oplog visibility rules (because it uses a reverse scanning cursor instead of a forward cursor), there could be uncommitted operations prior to "A" that affect documents after initial sync has already passed such documents by in its collection scan phase. Such changes would never be applied to the initial-syncing node. |
| Comments |
| Comment by Githook User [ 21/May/18 ] |
|
Author: {'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}Message: |
| Comment by Eric Milkie [ 11/Sep/17 ] |
|
I believe that the easiest solution to fix version 3.2 would be to change the initial sync code to wait to start cloning until after the first data comes back from the oplog tailing, as Geert suggests. |
| Comment by Eric Milkie [ 11/Sep/17 ] |
|
This problem was fixed in master and 3.4 branches by |
| Comment by Ian Whalen (Inactive) [ 29/Aug/17 ] |
|
geert.bosch, note in re priority: this looks like one of our most frequent build failures at the moment so I'd probably recommend prioritizing this fix over investigating other (less frequent) failures. |
| Comment by Geert Bosch [ 28/Aug/17 ] |
|
Probably the simplest way to fix this is by just starting tailing the oplog, and receiving the first batch, before starting the cloning. That way we know we'll see any ops that were in flight, as those would not be seen in a forward scan. This approach also has the benefit that it doesn't require any new features in the clone source, so should not run into multi-version issues and is more suitable for backporting. |
| Comment by Eric Milkie [ 18/Aug/17 ] |
|
To fix this in 3.4 and 3.2, we can make readAfterOpTime call waitForAllEarlierOplogWritesToBeVisible() when executing on a primary node, and then use readAfterOpTime read concern for the initial sync collection scans. Note that 3.2 doesn't have such a function but one could be added. |
| Comment by Eric Milkie [ 17/Aug/17 ] |
|
One way to fix this in 3.6 is to use causal consistency for the initial sync collection scan phase to ensure that the scans are definitively after the last oplog entry's timestamp and all previous writes as well. |