[SERVER-30724] Initial sync might miss ops that were in flight when it started Created: 17/Aug/17  Updated: 30/Oct/23  Resolved: 21/May/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 3.0.15, 3.2.16
Fix Version/s: 3.2.21

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
causes SERVER-38425 Oplog Visibility Query is a collectio... Closed
Related
related to SERVER-37408 Add afterClusterTime to initial sync ... Closed
related to SERVER-37468 Problem with initial sync when adding... Closed
is related to SERVER-30927 Use readConcern afterClusterTime for ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage 2017-09-11
Participants:

 Description   

When initial sync starts, the node reads the last oplog entry "A" from its sync source and uses that as its starting point when it applies operations in a later phase. Because such a read is not subject to oplog visibility rules (because it uses a reverse scanning cursor instead of a forward cursor), there could be uncommitted operations prior to "A" that affect documents after initial sync has already passed such documents by in its collection scan phase. Such changes would never be applied to the initial-syncing node.



 Comments   
Comment by Githook User [ 21/May/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-30724 Initial sync waits for oplog visibility before beginning clone
Branch: v3.2
https://github.com/mongodb/mongo/commit/b64de307169891f859c29f207e712ed0eb3cd2a2

Comment by Eric Milkie [ 11/Sep/17 ]

I believe that the easiest solution to fix version 3.2 would be to change the initial sync code to wait to start cloning until after the first data comes back from the oplog tailing, as Geert suggests.

Comment by Eric Milkie [ 11/Sep/17 ]

This problem was fixed in master and 3.4 branches by SERVER-30927; only the 3.2 branch needs a solution by this ticket.

Comment by Ian Whalen (Inactive) [ 29/Aug/17 ]

geert.bosch, note in re priority: this looks like one of our most frequent build failures at the moment so I'd probably recommend prioritizing this fix over investigating other (less frequent) failures.

Comment by Geert Bosch [ 28/Aug/17 ]

Probably the simplest way to fix this is by just starting tailing the oplog, and receiving the first batch, before starting the cloning. That way we know we'll see any ops that were in flight, as those would not be seen in a forward scan.

This approach also has the benefit that it doesn't require any new features in the clone source, so should not run into multi-version issues and is more suitable for backporting.

Comment by Eric Milkie [ 18/Aug/17 ]

To fix this in 3.4 and 3.2, we can make readAfterOpTime call waitForAllEarlierOplogWritesToBeVisible() when executing on a primary node, and then use readAfterOpTime read concern for the initial sync collection scans. Note that 3.2 doesn't have such a function but one could be added.

Comment by Eric Milkie [ 17/Aug/17 ]

One way to fix this in 3.6 is to use causal consistency for the initial sync collection scan phase to ensure that the scans are definitively after the last oplog entry's timestamp and all previous writes as well.

Generated at Thu Feb 08 04:24:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.