Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-52623

Aborting in-progress transactions on step-up with eMRC=off can set the stable timestamp ahead of the all durable timestamp

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Replication
    • ALL

      While working on SERVER-51387, to assert that the stable timestamp is never set higher than the all durable timestamp, I ran into a problem with aborting in-progress transactions on step-up with eMRC=off on this test.

      While aborting the in-progress transactions, the stable timestamp is always being set higher than the all durable timestamp by one.

      TXN      [OplogApplier-0] Aborting in-progress transactions on stepup.
      TXN      [OplogApplier-0] New transaction started {"txnNumber":0,"lsid":{"uuid":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"}}}
      TXN      [OplogApplier-0] Aborting transaction {"sessionId":{"id":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0}
      TXN      [OplogApplier-0] transaction {"parameters":{"lsid":{"id":{"$uuid":"b7c9b1d4-1883-46c5-b73d-d235c3d41623"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0,"autocommit":false,"readConcern":{"provenance":"clientSupplied"}},"readTimestamp":"Timestamp(0, 0)","terminationCause":"aborted","timeActiveMicros":0,"timeInactiveMicros":379,"numYields":0,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":3}},"ReplicationStateTransition":{"acquireCount":{"w":4,"W":1}},"Global":{"acquireCount":{"r":1,"w":3}},"Database":{"acquireCount":{"r":1,"W":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":2}}},"storage":{},"wasPrepared":false,"durationMillis":0}
      REPL     [OplogApplier-0] Setting replication's stable optime {"stableOpTime":{"ts":{"$timestamp":{"t":1604518513,"i":4}},"t":2}}
      STORAGE  [OplogApplier-0] The stable timestamp was greater than the all durable timestamp {"stableTimestamp":{"$timestamp":{"t":1604518513,"i":4}},"allDurableTimestamp":{"$timestamp":{"t":1604518513,"i":3}}}
      -        [OplogApplier-0] Fatal assertion {"msgid":5138700,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":1925}
      

      I see that when we recalculate the stable timestamp, we only take the all durable timestamp into consideration if the node canAcceptNonLocalWrites(). However, because the node is still stepping up the canAcceptNonLocalWrites() flag wasn't updated yet. That flag gets updated once the state transition is complete.

      I believe this works for eMRC=on today because we use the commit point instead of the last applied, which in my testing was less than the all durable timestamp.
       

       

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: