Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46466

Race with findAndModify retryable write and session migration

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.6.18, 4.0.17
    • Affects Version/s: 3.6.0, 4.0.0
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v4.0, v3.6
    • Sharding 2020-03-09

      Race:

      1. FindAndModify write with txnNumber 10 is executed in shardA
      2. Migration of chunk from shardA to shardB starts.
      3. Session migration thread pulled oplog for write in step#1 and passed all the checks and about to write oplog here
      4. A new retryable write with txnNumber 11 starts and successfully writes to oplog.
      5. Session migration thread writes oplog for txnNumber 10. Primary successfully wrote an oplog with higher optime but lower txnNumber.

      Consequence:

      Secondaries can potentially hit this fassert:
      https://github.com/mongodb/mongo/blob/r4.0.15/src/mongo/db/repl/session_update_tracker.cpp#L98

      Note: this race is no longer possible in v4.2 because we checkout the session when session migration thread tries to process the oplog entries, so the interleaving is no longer possible.

      Here are the conditions to hit to this race:

      • running older than v4.2
      • using retryable writes with findAndModify
      • migrations happening while using retryable write

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            randolph@mongodb.com Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: