Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56631

Retryable write pre-fetch phase could miss entry from config.transactions when reading from donor secondaries

    • Fully Compatible
    • ALL
    • v5.0
    • Repl 2021-05-17, Repl 2021-05-31, Repl 2021-06-14

      SessionUpdateTracker::_updateSessionInfo() is used by secondary oplog application to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. After SERVER-47844, majority snapshot of a secondary is allowed to exist somewhere in the middle of a completed batch. So secondary majority reads on config.transactions may not reflect committed retryable writes at that majority commit point.

      For example, we could have a batch on a donor secondary like this:
      TS1 txn1 stmtId 1
      TS2 txn1 stmtId 2
      TS5 txn1 stmtId 3
      TS10 txn1 stmtId 4
      After applying this batch, the secondary does a single update to the config.transaction table using TS10 and the transaction entry will have lastWriteOpTime at 10. When the retryable write pre-fetch phase fetches from the donor secondary, the majority committed snapshot could be at TS5. If that's the case, reading with TS5 would not see the config.transactions entry written at TS10. Let's assume the recipient's startFetchingTimestamp is also TS5, the recipient would then end up missing stmtId 1 and 2 after the migration because the pre-fetch phase misses the transaction record written at TS10. (See also SERVER-55305 and SERVER-55578)

      This ticket should verify this behavior. One idea to fix this is to read with local read concern (which would always at lastApplied/last batch boundary on secondaries) and then wait for the operationTime returned to be majority committed on the donor, similar to what we did on the cloners.

            Assignee:
            wenbin.zhu@mongodb.com Wenbin Zhu
            Reporter:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: