Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40110

ClusterCursorManager::CursorEntry::isKillPending() should not call checkForInterrupt

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2, v4.0
    • Sprint:
      Query 2019-04-08, Query 2019-04-22, Query 2019-05-06, Query 2019-05-20, Query 2019-06-03, Query 2019-06-17, Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12
    • Linked BF Score:
      30

      Description

      isKillPending()

                bool isKillPending() const {
                    // A cursor is kill pending if it's checked out by an OperationContext that was
                    // interrupted.
                    return _operationUsingCursor &&
                        !_operationUsingCursor->checkForInterruptNoAssert().isOK();
                }
      

      is calling checkForInterruptNoAssert.

      That's a problem because:

      • the rule for opCtx is that the owning thread can call all methods, and external threads can call some methods, while holding the client lock
        • isKillPending is called from ClusterCursorManager::stats, which is called by ftdc, which doesn't hold any client locks
      • checkForInterruptNoAssert isn't meant to be called from off thread. opCtx->isKillPending() is the way to do that
      • checkForInterrupt can actually invoke markKilled, if the op is now timed out
      • markKilled does a dance with waitForConditionOrInterrupt that, when the latter is active, does:
        • an unlock of the client lock. This probably kicks the system into an unanticipated state
        • a lock of the mutex specified in waitForConditionOrInterrupt
        • a lock of the client lock
        • Setting the killcode
        • unlocking the mutex specified in waitForConditionOrInterrupt

      I think this doesn't usually come up because the ops using cluster cursors usually haven't exceeded maxtimems. That may be an avenue for an isolated repro.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ian.boros Ian Boros
              Reporter:
              jason.carey Jason Carey
              Participants:
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: