Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46466

Race with findAndModify retryable write and session migration

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Fixed
    • 3.6.0, 4.0.0
    • 3.6.18, 4.0.17
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v4.0, v3.6
    • Sharding 2020-03-09

    Description

      Race:

      1. FindAndModify write with txnNumber 10 is executed in shardA
      2. Migration of chunk from shardA to shardB starts.
      3. Session migration thread pulled oplog for write in step#1 and passed all the checks and about to write oplog here
      4. A new retryable write with txnNumber 11 starts and successfully writes to oplog.
      5. Session migration thread writes oplog for txnNumber 10. Primary successfully wrote an oplog with higher optime but lower txnNumber.

      Consequence:

      Secondaries can potentially hit this fassert:
      https://github.com/mongodb/mongo/blob/r4.0.15/src/mongo/db/repl/session_update_tracker.cpp#L98

      Note: this race is no longer possible in v4.2 because we checkout the session when session migration thread tries to process the oplog entries, so the interleaving is no longer possible.

      Here are the conditions to hit to this race:

      • running older than v4.2
      • using retryable writes with findAndModify
      • migrations happening while using retryable write

      Attachments

        Issue Links

          Activity

            People

              randolph@mongodb.com Randolph Tan
              randolph@mongodb.com Randolph Tan
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: