[SERVER-31700] waitForAllEarlierOplogWritesToBeVisible might hang after rollback, in rare cases Created: 24/Oct/17 Updated: 27/Oct/23 Resolved: 01/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Eric Milkie |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Sprint: | Execution Team 2019-11-04 |
| Participants: |
| Description |
|
When in waitForAllEarlierOplogWritesToBeVisible, the method first records the "oplog read timestamp": https://github.com/mongodb/mongo/blob/43b018194e1a49249092b871f3b7396e473d1426/src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp#L82-L84 Every iteration of the wait-loop will compare this recorded value with a current reading. If that value goes backwards, that signals a rollback happened, which breaks out of the wait loop: However, because those values are just timestamps and not OpTimes (which contain terms), it's possible for a rollback to not be detected. For example, consider the "oplog read timestamp" starting at 10 and the method is waiting for time 15 to become visible. During one wait of the condition variable, it's possible the "oplog read timestamp" to advance to time 13, followed by a rollback that sets the time back to 11. Thus the rollback would not be detected. I don't believe this is a matter of correctness; I expect callers must be doing rollback detection (or alternatively, operation contexts waiting on a condition variable may be getting signaled on rollback in a way that throws an exception?). This is just a matter of liveness. In a typical case, the timestamp the reader is waiting to become visible is already committed and the system is just waiting for earlier holes (i.e: concurrent, in progress transactions) to be resolved. With the code as it is, in the case of a rollback, the history up to the previous "oplog read timestamp" is destroyed. The liveness guarantee of this wait for the now defunct "oplog read timestamp" is predicated on new activity coming into the system. |
| Comments |
| Comment by Eric Milkie [ 01/Nov/19 ] |
|
I believe that on rollback, everything is signaled that could be blocked waiting in this function, and thus this is no longer an issue. |
| Comment by Spencer Brody (Inactive) [ 25/Oct/17 ] |
|
Okay, so long as waitForAllEarlierOplogWritesToBeVisible respects opCtx interruption then it should be fine. |
| Comment by Eric Milkie [ 25/Oct/17 ] |
|
However, I would presume all remote queries from secondaries would use a socket timeout or maxTimeMS, and so they wouldn't hang forever anyway. |
| Comment by Eric Milkie [ 25/Oct/17 ] |
|
Maybe? You would have to call that function prior to rollback and then continue to be waiting after rollback, and I am presuming that such remote queries on secondaries would be disconnected as part of the rollback process. |
| Comment by Spencer Brody (Inactive) [ 25/Oct/17 ] |
|
We use waitForAllEarlierOplogWritesToBeVisible when serving reads on the oplog - can this cause secondaries to stop replicating? |