Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67466

Internal transactions API may be memory unsafe when outer OperationContext is interrupted

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v6.0
    • Sharding 2022-07-25
    • 160

    Description

      SERVER-67016 made it so when the outer OperationContext is interrupted (e.g. due to a stepdown), then the cancellation token used by the transaction API will be canceled. This is important to ensure the operations running within the internal transaction are themselves eventually interrupted. However, canceling the cancellation source isn't sufficient to ensure the tasks running on the transaction API's executor have actually completely finished running. This means it is still possible for the outer OperationContext to be interrupted, for the cancellation source to be canceled, but for SyncTransactionWithRetries::runNoThrow() to return and the server to destroy the original command request before the task running on the transaction API's executor have drained.

      Take the _configsvrRefineCollectionShardKey command for example. The _configsvrRefineCollectionShardKey command calls ShardingCatalogManager::refineCollectionShardKey() using a ShardKeyPattern with an underlying BSONObj which has its lifetime bound to the _configsvrRefineCollectionShardKey command request. If the OperationContext of the refineCollectionShardKey is interrupted (e.g. via the killOp command), then a task running in the task running on the transaction API's executor may continue to refer to the underlying BSONObj's memory even after it has been released.

      Instead, the transaction API should additionally wait for the tasks running on the transaction API's executor to have all settled after canceling the cancellation source. This way none of the captures of the lambda callback may still be in use after SyncTransactionWithRetries::runNoThrow() has returned up the stack.

      auto txnFuture = _txn->run(std::move(callback))
      auto txnResult = txnFuture.getNoThrow(opCtx);
      // Cancel the source to guarantee the transaction will terminate if our opCtx was interrupted.
      _source.cancel();
      txnFuture.wait()
      

      Attachments

        Issue Links

          Activity

            People

              sanika.phanse@mongodb.com Sanika Phanse
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: