[SERVER-30771] investigate oplog stones initialization Created: 22/Aug/17  Updated: 23/Sep/19  Resolved: 23/Sep/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Eric Milkie Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-43412 Create a wrapper around getCursor cal... Closed
Sprint: Execution Team 2019-07-15, Execution Team 2019-07-29, Execution Team 2019-08-12, Execution Team 2019-08-26, Execution Team 2019-09-09, Execution Team 2019-09-23
Participants:

 Description   

For some reason, it is reading from a data collection and then from the oplog collection in one transaction, which is preventing our code from enforcing that oplog collection reads must be the first in a transaction (so that oplog visibility rules can be enacted at the beginning of the txn).



 Comments   
Comment by Dianna Hohensee (Inactive) [ 23/Sep/19 ]

I am closing this ticket after creating SERVER-43412 to try a new approach, based upon findings here.

Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ]

Another interesting twist in ReplicationRecoveryImpl::_truncateOplogTo , wherein we use a backwards (as opposed to forward) oplog cursor, followed by a forward oplog cursor when we Collection:: cappedTruncateAfter.

We bypass the getCursor check here because we hold the MODE_X collection lock.

Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ]

I've figured out how to solve the above problems by raising the WT specific isOplogReader function into recovery_unit.h/cpp. I've encountered a few more problem code paths. The most significant issue that has arisen is that we cannot keep the X lock override of the rules here. Now that we check in setIsOplogReader that we're either not in an active txn or isOplogReader is set, we cannot support special WT bypasses for when the mode X collection lock is held.

Comment by Daniel Solnik (Inactive) [ 26/Jul/19 ]

After adding in the invariant and running mongod with the following command:

./mongod --replSet foo

I was able to reproduce the bug. 

In _calculateStones, a cursor is opened using StandardWiredTigerRecordStore::getCursor and this cursor opening leads to a call to WiredTigerRecoveryUnit::setIsOplogReader which causes the crash.

Looking further in, there is an invariant in StandardWiredTigerRecordStore::getCursor that checks that either we are not in an active transaction or the opLog is locked for X. At the time that getCursor is called, the opLog is locked for X and so the invariant does not fail. This however, causes the invariant to fail immediately after when setIsOplogReader is called. 

Adding in an abandon snapshot directly after this invariant and right before setIsOplogReader fixes the problem when I run mongod locally but causes failures in other tests.

After looking at the git blame and trying to figure out why this second part of the invariant is there I'm still unsure and would like to understand why this is here. 

Here is the stack trace for the failure upon start:

mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x17E) [0x106eb766e] mongod(_ZN5mongo21invariantWithLocationIbEEvRKT_PKcS5_j+0x3C) [0x102297fdc] mongod(_ZN5mongo22WiredTigerRecoveryUnit16setIsOplogReaderEv+0x4D) [0x1023d1b4d] mongod(_ZNK5mongo29StandardWiredTigerRecordStore9getCursorEPNS_16OperationContextEb+0x1CC) [0x1023d5bec] mongod(_ZN5mongo21WiredTigerRecordStore11OplogStones26_calculateStonesByScanningEPNS_16OperationContextE+0xC5) [0x1023bfcc5] mongod(_ZN5mongo21WiredTigerRecordStore11OplogStones16_calculateStonesEPNS_16OperationContextEm+0x183) [0x1023bdb83] mongod(_ZN5mongo21WiredTigerRecordStore11OplogStonesC2EPNS_16OperationContextEPS0_+0x258) [0x1023bd808] mongod(_ZN5mongo21WiredTigerRecordStore11OplogStonesC1EPNS_16OperationContextEPS0_+0x25) [0x1023bdd05] mongod(_ZNSt3__122__compressed_pair_elemIN5mongo21WiredTigerRecordStore11OplogStonesELi1ELb0EEC2IJRPNS1_16OperationContextEOPS2_EJLm0ELm1EEEENS_21piecewise_construct_tENS_5tupleIJDpT_EEENS_15__tuple_indicesIJXspT0_EEEE+0x66) [0x1023ea6a6] mongod(_ZNSt3__117__compressed_pairINS_9allocatorIN5mongo21WiredTigerRecordStore11OplogStonesEEES4_EC2IJRS5_EJRPNS2_16OperationContextEOPS3_EEENS_21piecewise_construct_tENS_5tupleIJDpT_EEENSF_IJDpT0_EEE+0x86) [0x1023ea596] mongod(_ZNSt3__117__compressed_pairINS_9allocatorIN5mongo21WiredTigerRecordStore11OplogStonesEEES4_EC1IJRS5_EJRPNS2_16OperationContextEOPS3_EEENS_21piecewise_construct_tENS_5tupleIJDpT_EEENSF_IJDpT0_EEE+0x2D) [0x1023ea16d] mongod(_ZNSt3__120__shared_ptr_emplaceIN5mongo21WiredTigerRecordStore11OplogStonesENS_9allocatorIS3_EEEC2IJRPNS1_16OperationContextEPS2_EEES5_DpOT_+0x9C) [0x1023ea00c] mongod(_ZNSt3__120__shared_ptr_emplaceIN5mongo21WiredTigerRecordStore11OplogStonesENS_9allocatorIS3_EEEC1IJRPNS1_16OperationContextEPS2_EEES5_DpOT_+0x25) [0x1023e9c65] mongod(_ZNSt3__110shared_ptrIN5mongo21WiredTigerRecordStore11OplogStonesEE11make_sharedIJRPNS1_16OperationContextEPS2_EEES4_DpOT_+0xED) [0x1023e9a2d] mongod(_ZNSt3__1L11make_sharedIN5mongo21WiredTigerRecordStore11OplogStonesEJRPNS1_16OperationContextEPS2_EEENS_9enable_ifIXntsr8is_arrayIT_EE5valueENS_10shared_ptrIS9_EEE4typeEDpOT0_+0x44) [0x1023c45f4] mongod(_ZN5mongo21WiredTigerRecordStore19postConstructorInitEPNS_16OperationContextE+0xA99) [0x1023c4389] mongod(_ZN5mongo18WiredTigerKVEngine21getGroupedRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsENS_8KVPrefixE+0x336) [0x102383c16] mongod(_ZN5mongo17StorageEngineImpl15_initCollectionEPNS_16OperationContextERKNS_15NamespaceStringEb+0x4B0) [0x103a43c30] mongod(_ZN5mongo17StorageEngineImpl11loadCatalogEPNS_16OperationContextE+0x1C6B) [0x103a41f7b] mongod(_ZN5mongo17StorageEngineImplC2EPNS_8KVEngineENS_20StorageEngineOptionsE+0x4BB) [0x103a3ff2b] mongod(_ZN5mongo17StorageEngineImplC1EPNS_8KVEngineENS_20StorageEngineOptionsE+0x6E) [0x103a423ce] mongod(_ZNK5mongo12_GLOBAL__N_117WiredTigerFactory6createERKNS_19StorageGlobalParamsEPKNS_21StorageEngineLockFileE+0x79F) [0x10231c56f] mongod(_ZN5mongo23initializeStorageEngineEPNS_14ServiceContextENS_22StorageEngineInitFlagsE+0xF3D) [0x103a844dd] mongod(_ZN5mongo12_GLOBAL__N_114_initAndListenEi+0xB48) [0x1022ac378] mongod(_ZN5mongo12_GLOBAL__N_113initAndListenEi+0x2A) [0x10229a7ba] mongod(_ZN5mongo12_GLOBAL__N_111mongoDbMainEiPPcS2_+0x4D0) [0x102292080] mongod(main+0x2A) [0x102291b9a]
 

 

Comment by Eric Milkie [ 26/Jun/19 ]

(actually it doesn't crash on startup, it crashes on replica set initiate, since I was starting with empty data files)

Comment by Eric Milkie [ 26/Jun/19 ]

I tried adding this invariant myself, and the code still crashes on startup in WiredTigerRecordStore::OplogStones::_calculateStones(), so there must be some code path in the initialization for oplog stones that is reusing a transaction to read out of the oplog. We probably just need an abandonSnapshot() somewhere.

Comment by Eric Milkie [ 26/Jun/19 ]

The idea would be to eventually add a check in WiredTigerRecoveryUnit::setIsOplogReader():

invariant(!inActiveTxn());

Otherwise, if we are setting _isOplogReader after the transaction has started, it's too late to set _oplogVisibleTs in that same class instance (it's set in txnOpen()). And thus _oplogVisibleTs remains set to nothing, and thus we do not implement any oplog visibility when we then open a cursor on the oplog (see the constructor for WiredTigerRecordStoreCursorBase).

Comment by Dianna Hohensee (Inactive) [ 19/Jun/19 ]

milkie, could you provide more context on this task? I'm not sure what you mean by 'oplog stones initialization', where exactly we read from a data collection and then the oplog collection? I also don't know in what code we want to enforce oplog collection reads occur before other collection ops in a transaction – this sounds like something we'd want to add as part of this task if successful, and would help with understanding.

Generated at Thu Feb 08 04:24:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.