[SERVER-30771] investigate oplog stones initialization Created: 22/Aug/17 Updated: 23/Sep/19 Resolved: 23/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Execution Team 2019-07-15, Execution Team 2019-07-29, Execution Team 2019-08-12, Execution Team 2019-08-26, Execution Team 2019-09-09, Execution Team 2019-09-23 | ||||||||
| Participants: | |||||||||
| Description |
|
For some reason, it is reading from a data collection and then from the oplog collection in one transaction, which is preventing our code from enforcing that oplog collection reads must be the first in a transaction (so that oplog visibility rules can be enacted at the beginning of the txn). |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 23/Sep/19 ] | ||||
|
I am closing this ticket after creating | ||||
| Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ] | ||||
|
Another interesting twist in ReplicationRecoveryImpl::_truncateOplogTo , wherein we use a backwards (as opposed to forward) oplog cursor, followed by a forward oplog cursor when we Collection:: cappedTruncateAfter. We bypass the getCursor check here because we hold the MODE_X collection lock. | ||||
| Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ] | ||||
|
I've figured out how to solve the above problems by raising the WT specific isOplogReader function into recovery_unit.h/cpp. I've encountered a few more problem code paths. The most significant issue that has arisen is that we cannot keep the X lock override of the rules here. Now that we check in setIsOplogReader that we're either not in an active txn or isOplogReader is set, we cannot support special WT bypasses for when the mode X collection lock is held. | ||||
| Comment by Daniel Solnik (Inactive) [ 26/Jul/19 ] | ||||
|
After adding in the invariant and running mongod with the following command:
I was able to reproduce the bug. In _calculateStones, a cursor is opened using StandardWiredTigerRecordStore::getCursor and this cursor opening leads to a call to WiredTigerRecoveryUnit::setIsOplogReader which causes the crash. Looking further in, there is an invariant in StandardWiredTigerRecordStore::getCursor that checks that either we are not in an active transaction or the opLog is locked for X. At the time that getCursor is called, the opLog is locked for X and so the invariant does not fail. This however, causes the invariant to fail immediately after when setIsOplogReader is called. Adding in an abandon snapshot directly after this invariant and right before setIsOplogReader fixes the problem when I run mongod locally but causes failures in other tests. After looking at the git blame and trying to figure out why this second part of the invariant is there I'm still unsure and would like to understand why this is here. Here is the stack trace for the failure upon start:
| ||||
| Comment by Eric Milkie [ 26/Jun/19 ] | ||||
|
(actually it doesn't crash on startup, it crashes on replica set initiate, since I was starting with empty data files) | ||||
| Comment by Eric Milkie [ 26/Jun/19 ] | ||||
|
I tried adding this invariant myself, and the code still crashes on startup in WiredTigerRecordStore::OplogStones::_calculateStones(), so there must be some code path in the initialization for oplog stones that is reusing a transaction to read out of the oplog. We probably just need an abandonSnapshot() somewhere. | ||||
| Comment by Eric Milkie [ 26/Jun/19 ] | ||||
|
The idea would be to eventually add a check in WiredTigerRecoveryUnit::setIsOplogReader():
Otherwise, if we are setting _isOplogReader after the transaction has started, it's too late to set _oplogVisibleTs in that same class instance (it's set in txnOpen()). And thus _oplogVisibleTs remains set to nothing, and thus we do not implement any oplog visibility when we then open a cursor on the oplog (see the constructor for WiredTigerRecordStoreCursorBase). | ||||
| Comment by Dianna Hohensee (Inactive) [ 19/Jun/19 ] | ||||
|
milkie, could you provide more context on this task? I'm not sure what you mean by 'oplog stones initialization', where exactly we read from a data collection and then the oplog collection? I also don't know in what code we want to enforce oplog collection reads occur before other collection ops in a transaction – this sounds like something we'd want to add as part of this task if successful, and would help with understanding. |