[SERVER-32185] Freshly synced secondaries respond to queries before their "sync time" Created: 06/Dec/17 Updated: 27/Oct/23 Resolved: 08/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
The last phase of a secondary performing initial sync is to apply oplog operations up through some time `T` representing when the collection cloning phase completed. It's incorrect for a secondary to respond to majority read/at a timestamp queries before time T. When a secondary comes out of initial sync, it will still have a notion of the replica sets majority commit time. Because the majority commit time is translated to a "read at a timestamp", the secondary will incorrectly respond to a query, but with a view of inconsistent data. A couple starting points for solutions:
|
| Comments |
| Comment by Daniel Gottlieb (Inactive) [ 08/Dec/17 ] |
|
I think I flubbed making this ticket. After talking with judah.schvimer, taking a fresh look at trying to reproduce with logs on master, I think what I was observing was really |
| Comment by Judah Schvimer [ 08/Dec/17 ] |
|
Per conversation with daniel.gottlieb, it appears that doing majority reads at the stable timestamp should be sufficient. We seem to be doing exactly this, so it's unclear what's going on. This will still leave us open to a rollback right after initial sync requiring a resync. From |
| Comment by Judah Schvimer [ 06/Dec/17 ] |
|
I like the user visible behavior of staying in initial sync rather than restarting initial sync if we roll back shortly after leaving initial sync. My only concern is relying on secondaries in initial sync to commit writes. Per conversation with milkie, this is no different than behavior we have today, and it should work, but it definitely feels weird. |
| Comment by Daniel Gottlieb (Inactive) [ 06/Dec/17 ] |
That may be a reasonable option as well. I didn't think replication kept its "moved out of initial sync" time and that's why we introduced a setInitialDataTimestamp time. But, if I'm wrong, I don't see any fundamental reason why your suggestion wouldn't work. |
| Comment by Eric Milkie [ 06/Dec/17 ] |
|
In 3.6 we no longer create any named snapshots, so there is no longer any "blessing" mechanism – the logic is completely different now. |
| Comment by Daniel Gottlieb (Inactive) [ 06/Dec/17 ] |
|
3.6 yes, 3.4 I don't think so. This should be backported, yes. |
| Comment by Judah Schvimer [ 06/Dec/17 ] |
|
Why are we blessing snapshots as "committed" if they're inconsistent? I think we already have a mechanism for blocking reads when no majority snapshot is available. I suggest we just never set the committed snapshot to an inconsistent snapshot. We should be able to do something similar to |
| Comment by Judah Schvimer [ 06/Dec/17 ] |
|
daniel.gottlieb, Does this affect 3.6 and need to be backported? |