Resharding skips noop creation for sessions with incomplete oplog history

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • ClusterScalability 30Mar-13Apr
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During resharding, the ReshardingOplogSessionApplication is responsible for creating noop oplog entries on recipient shards for retryable writes committed after the fetchTimestamp. These noops preserve the writes' retryability after resharding completes. The oplog session application code calls session_catalog_migration_util::runWithSessionCheckedOutIfStatementNotExecuted to check out the session and create the noop. Inside this utility, the session state is refreshed from config.transactions and the oplog. If the oplog entry referenced by config.transactions.lastWriteOpTime has been truncated, fetchActiveTransactionHistory marks the session with hasIncompleteHistory = true.

      Subsequently, checkStatementExecuted() throws IncompleteTransactionHistory because the statement is not in the committed statements cache and the history is incomplete. The catch handler in runWithSessionCheckedOutIfStatementNotExecuted catches this exception indiscriminately and returns boost::none, causing the callable (which creates the noop) to be skipped entirely. As a result, the retryable write's session record in config.transactions continues to reference the original and now truncated oplog entry. Any subsequent retry of the write fails with "IncompleteTransactionHistory: oplog no longer contains the complete write history of this transaction".

      The impact is that we lose retryability guarantees on retryable writes during resharding.

            Assignee:
            Nandini Bhartiya
            Reporter:
            Abdul Qadeer
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: