Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55305

Retryable write may execute more than once if primary had transitioned through rollback to stable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: Replication, Sharding
    • None
    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0
    • Repl 2021-04-19, Repl 2021-05-03, Repl 2021-05-17
    • 200

      SessionUpdateTracker::_updateSessionInfo() is used by secondary oplog application to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. The changes from 02020fa as part of SERVER-47844 made it possible for a secondary to choose its stable_timestamp as a majority-committed timestamp from within an oplog batch rather than always being on a batch boundary. The combination of these two can lead to the following sequence:

      1. During single batch of oplog application:
        1. User data write for stmtId=0 at t=10.
        2. User data write for stmtId=1 at t=11.
        3. User data write for stmtId=2 at t=12.
        4. Session txn record write at t=12 with stmtId=2 as lastWriteOpTime.
          • In particular, no session txn record write for t=10 with stmtId=0 as lastWriteOpTime or for t=11 with stmtId=1 as lastWriteOpTime because they were coalseced by the SessionUpdateTracker.
      2. Rollback to stable timestamp t=10.
      3. The session txn record won't exist with stmtId=0 as lastWriteOpTime (because the write was entirely skipped by oplog application) despite the user data write for stmtId=0 being reflected on-disk. This allows stmtId=0 to be re-executed by this node if it became primary.
      Impact on 4.0, 4.2, and 4.4 branches

      The stable optime candidates list prevents this issue for retryable inserts, updates, and deletes applied during secondary oplog application.

      However, retryable inserts on primaries also coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. This happens through OpObserverImpl::onInserts() calling TransasctionParticipant::onWriteOpCompletedOnPrimary() once for a batch of insert statements (aka vectored insert).

      • Only retryable inserts are impacted.
      • A retry attempt fails with a DuplicateKey error so long as the document wasn't deleted by another client in the meantime. (The document is re-inserted otherwise.)
      Impact on 4.9 and master branches

      The stable optime candidates list was removed and so this issue exists for retryable inserts, updates, and deletes applied during secondary oplog application. Retryable inserts on primaries continue to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement.

      • All of retryable inserts, updates, and deletes are impacted.
      • A retry attempt for an update can execute more than once (e.g. double increment a counter).

      This issue was discovered while reasoning through why the atClusterTime read on config.transactions to fix SERVER-54626 was insufficient (hence SERVER-55214). Shout out to daniel.gottlieb for the assist!

            Assignee:
            jason.chan@mongodb.com Jason Chan
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: