Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38257

TransactionParticipant::abortArbitraryTransaction can cause incorrect transaction metrics to be recorded

    • Fully Compatible
    • ALL
    • Repl 2019-04-08
    • 0

      Based on my conversation with tess.avitabile, I think the following bug exists in transaction metrics active and inactive counts (the bug may affect other metrics as well, I'm not sure):

      If a transaction is aborted, whether the number of active or inactive transactions is decremented depends on whether the TxnResources were stashed at the time of the abort:

      This usually works because in a transaction request's flow:

      • First, TransactionParticipant::beginOrContinue calls TransactionMetricsObserver::onStart, which increments the number of currently inactive transactions
      • Later, TransactionParticipant::unstashTransactionResources calls TransactionMetricsObserver::onUnstash (both if the TxnResources already exist, or if they were just created), which increments the number of currently active transactions and decrements the number of currently inactive transactions.

      However, TransactionParticipant::abortArbitraryTransaction can be called outside of a checked out Session, and so the following sequence can happen, which causes the metrics to be incorrect for a short period:

      // inactive: 0
      // active: 0
      
      // Starts *new* transaction; increments inactive count
      Thread 1: TransactionParticipant::beginOrContinue
      
      // inactive: 1
      // active: 0
      
      // TxnResources have not been created, so _txnResourceStash is boost::none; interprets this as meaning the transaction is active and decrements active count
      Thread 2: TransactionParticipant::abortArbitraryTransaction
      
      // inactive: 1 <----- metric incorrect
      // active: -1 <----- metric incorrect
      
      Thread 1: TransactionParticipant::unstashTransactionResources
      
      // inactive: 0 <----- metric remedied
      // active: 0 <----- metric remedied

      However, if TransactionParticipant::unstashTransactionResources throws before calling TransactionMetricsObserver:onUnstash, for example by timing out waiting to acquire the GlobalLock, then the inactive and active counts may remain permanently incorrect. 

            Assignee:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: