-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Replication
-
ALL
While working on SERVER-51387, to assert that the stable timestamp is never set higher than the all durable timestamp, I ran into a problem with aborting in-progress transactions on step-up with eMRC=off on this test.
While aborting the in-progress transactions, the stable timestamp is always being set higher than the all durable timestamp by one.
TXN [OplogApplier-0] Aborting in-progress transactions on stepup. TXN [OplogApplier-0] New transaction started {"txnNumber":0,"lsid":{"uuid":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"}}} TXN [OplogApplier-0] Aborting transaction {"sessionId":{"id":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0} TXN [OplogApplier-0] transaction {"parameters":{"lsid":{"id":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0,"autocommit":false,"readConcern":{"provenance":"clientSupplied"}},"readTimestamp":"Timestamp(0, 0)","terminationCause":"aborted","timeActiveMicros":0,"timeInactiveMicros":379,"numYields":0,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"ReplicationStateTransition":{"acquireCount":{"w":4,"W":1}},"Global":{"acquireCount":{"r":1,"w":3}},"Database":{"acquireCount":{"r":1,"W":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":2}}},"storage":{},"wasPrepared":false,"durationMillis":0} REPL [OplogApplier-0] Setting replication's stable optime {"stableOpTime":{"ts":{"$timestamp":{"t":1604518513,"i":4}},"t":2}} STORAGE [OplogApplier-0] The stable timestamp was greater than the all durable timestamp {"stableTimestamp":{"$timestamp":{"t":1604518513,"i":4}},"allDurableTimestamp":{"$timestamp":{"t":1604518513,"i":3}}} - [OplogApplier-0] Fatal assertion {"msgid":5138700,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":1925}
I see that when we recalculate the stable timestamp, we only take the all durable timestamp into consideration if the node canAcceptNonLocalWrites(). However, because the node is still stepping up the canAcceptNonLocalWrites() flag wasn't updated yet. That flag gets updated once the state transition is complete.
I believe this works for eMRC=on today because we use the commit point instead of the last applied, which in my testing was less than the all durable timestamp.
- related to
-
SERVER-52956 Add storage debug method to dump system-wide RecoveryUnit/transaction state
- Closed
-
SERVER-51387 Assert that the stable timestamp is never set higher than the WT all_durable timestamp
- Closed