Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40110

ClusterCursorManager::CursorEntry::isKillPending() should not call checkForInterrupt

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2, v4.0
    • Sprint:
      Query 2019-04-08, Query 2019-04-22, Query 2019-05-06, Query 2019-05-20, Query 2019-06-03, Query 2019-06-17, Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12
    • Linked BF Score:
      30

      Description

      isKillPending()

                bool isKillPending() const {
                    // A cursor is kill pending if it's checked out by an OperationContext that was
                    // interrupted.
                    return _operationUsingCursor &&
                        !_operationUsingCursor->checkForInterruptNoAssert().isOK();
                }
      

      is calling checkForInterruptNoAssert.

      That's a problem because:

      • the rule for opCtx is that the owning thread can call all methods, and external threads can call some methods, while holding the client lock
        • isKillPending is called from ClusterCursorManager::stats, which is called by ftdc, which doesn't hold any client locks
      • checkForInterruptNoAssert isn't meant to be called from off thread. opCtx->isKillPending() is the way to do that
      • checkForInterrupt can actually invoke markKilled, if the op is now timed out
      • markKilled does a dance with waitForConditionOrInterrupt that, when the latter is active, does:
        • an unlock of the client lock. This probably kicks the system into an unanticipated state
        • a lock of the mutex specified in waitForConditionOrInterrupt
        • a lock of the client lock
        • Setting the killcode
        • unlocking the mutex specified in waitForConditionOrInterrupt

      I think this doesn't usually come up because the ops using cluster cursors usually haven't exceeded maxtimems. That may be an avenue for an isolated repro.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: