Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40110

ClusterCursorManager::CursorEntry::isKillPending() should not call checkForInterrupt

    • Fully Compatible
    • ALL
    • v4.2, v4.0
    • Query 2019-04-08, Query 2019-04-22, Query 2019-05-06, Query 2019-05-20, Query 2019-06-03, Query 2019-06-17, Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12
    • 30

      isKillPending()

                bool isKillPending() const {
                    // A cursor is kill pending if it's checked out by an OperationContext that was
                    // interrupted.
                    return _operationUsingCursor &&
                        !_operationUsingCursor->checkForInterruptNoAssert().isOK();
                }
      

      is calling checkForInterruptNoAssert.

      That's a problem because:

      • the rule for opCtx is that the owning thread can call all methods, and external threads can call some methods, while holding the client lock
        • isKillPending is called from ClusterCursorManager::stats, which is called by ftdc, which doesn't hold any client locks
      • checkForInterruptNoAssert isn't meant to be called from off thread. opCtx->isKillPending() is the way to do that
      • checkForInterrupt can actually invoke markKilled, if the op is now timed out
      • markKilled does a dance with waitForConditionOrInterrupt that, when the latter is active, does:
        • an unlock of the client lock. This probably kicks the system into an unanticipated state
        • a lock of the mutex specified in waitForConditionOrInterrupt
        • a lock of the client lock
        • Setting the killcode
        • unlocking the mutex specified in waitForConditionOrInterrupt

      I think this doesn't usually come up because the ops using cluster cursors usually haven't exceeded maxtimems. That may be an avenue for an isolated repro.

            Assignee:
            ian.boros@mongodb.com Ian Boros
            Reporter:
            mira.carey@mongodb.com Mira Carey
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: