[SERVER-31700] waitForAllEarlierOplogWritesToBeVisible might hang after rollback, in rare cases Created: 24/Oct/17  Updated: 27/Oct/23  Resolved: 01/Nov/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Eric Milkie
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Execution Team 2019-11-04
Participants:

 Description   

When in waitForAllEarlierOplogWritesToBeVisible, the method first records the "oplog read timestamp": https://github.com/mongodb/mongo/blob/43b018194e1a49249092b871f3b7396e473d1426/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp#L82-L84

Every iteration of the wait-loop will compare this recorded value with a current reading. If that value goes backwards, that signals a rollback happened, which breaks out of the wait loop:
https://github.com/mongodb/mongo/blob/43b018194e1a49249092b871f3b7396e473d1426/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp#L102-L109

However, because those values are just timestamps and not OpTimes (which contain terms), it's possible for a rollback to not be detected. For example, consider the "oplog read timestamp" starting at 10 and the method is waiting for time 15 to become visible. During one wait of the condition variable, it's possible the "oplog read timestamp" to advance to time 13, followed by a rollback that sets the time back to 11. Thus the rollback would not be detected.

I don't believe this is a matter of correctness; I expect callers must be doing rollback detection (or alternatively, operation contexts waiting on a condition variable may be getting signaled on rollback in a way that throws an exception?).

This is just a matter of liveness. In a typical case, the timestamp the reader is waiting to become visible is already committed and the system is just waiting for earlier holes (i.e: concurrent, in progress transactions) to be resolved.

With the code as it is, in the case of a rollback, the history up to the previous "oplog read timestamp" is destroyed. The liveness guarantee of this wait for the now defunct "oplog read timestamp" is predicated on new activity coming into the system.



 Comments   
Comment by Eric Milkie [ 01/Nov/19 ]

I believe that on rollback, everything is signaled that could be blocked waiting in this function, and thus this is no longer an issue.

Comment by Spencer Brody (Inactive) [ 25/Oct/17 ]

Okay, so long as waitForAllEarlierOplogWritesToBeVisible respects opCtx interruption then it should be fine.

Comment by Eric Milkie [ 25/Oct/17 ]

However, I would presume all remote queries from secondaries would use a socket timeout or maxTimeMS, and so they wouldn't hang forever anyway.

Comment by Eric Milkie [ 25/Oct/17 ]

Maybe? You would have to call that function prior to rollback and then continue to be waiting after rollback, and I am presuming that such remote queries on secondaries would be disconnected as part of the rollback process.

Comment by Spencer Brody (Inactive) [ 25/Oct/17 ]

We use waitForAllEarlierOplogWritesToBeVisible when serving reads on the oplog - can this cause secondaries to stop replicating?

Generated at Thu Feb 08 04:27:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.