Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13495

Concurrent GETMORE and KILLCURSORS operations can cause race condition and server crash

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.6.1, 2.7.0
    • Affects Version/s: 2.6.0-rc3
    • Component/s: Stability
    • Labels:
      None
    • ALL
    • Hide

      Running Motor's test_del_on_main_greenlet test. Motor is killing a cursor on one connection, and on another connection it's continuously issuing OP_GETMORE with that cursorId to know when the cursor has died.

      Show
      Running Motor's test_del_on_main_greenlet test. Motor is killing a cursor on one connection, and on another connection it's continuously issuing OP_GETMORE with that cursorId to know when the cursor has died.

      Issue Status as of April 15, 2014

      ISSUE SUMMARY
      Issuing an OP_GETMORE at the same time as an OP_KILLCURSORS may in some very rare cases trigger a race condition that can dereference a bad pointer in the server and cause it to crash.

      USER IMPACT
      Very low, but the bug can crash the running process.

      WORKAROUNDS
      None

      RESOLUTION
      In CollectionCursorCache, ClientCursor::_pinValue needs to be guarded by a mutex.

      AFFECTED VERSIONS
      Version 2.6.0 is affected by this bug.

      PATCHES
      The patch is included in the 2.6.1 production release.

      Original description

      Running Motor's test suite, a primary running version v2.6.0-rc4-pre-
      hash 8a71a0 crashed while killing a cursor. The log ends:

      2014-04-04T22:22:13.936+0000 [conn224] query motor_test.test_collection planSummary: COLLSCAN cursorid:189743471636 ntoreturn:0 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:380 nreturned:101 reslen:1434 0ms
      2014-04-04T22:22:13.937+0000 [conn224] motor_test.test_collection warning assertion failure false src/mongo/db/clientcursor.cpp 127
      2014-04-04T22:22:13.937+0000 [conn34] getmore motor_test.test_collection cursorid:189743471636 ntoreturn:0 keyUpdates:0 numYields:0 locks(micros) r:25 nreturned:0 reslen:20 0ms
      2014-04-04T22:22:13.937+0000 [conn34] run command motor_test.$cmd { delete: "test_collection", deletes: [ { q: {}, limit: 0 } ] }
      2014-04-04T22:22:13.937+0000 [conn34] parseAndRemoveImpersonatedUserField: command: { delete: "test_collection", deletes: [ { q: {}, limit: 0 } ] }
      2014-04-04T22:22:13.943+0000 [conn224] motor_test.test_collection 0xe63563 0xe1de60 0xe067e8 0x9337d8 0x91023c 0x9105b1 0xaa52cf 0xaaa4e0 0x84955b 0xe2a3fc 0x7f8676c52f18 0x7f867591ae0d
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xe63563]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo10logContextEPKc+0x190) [0xe1de60]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x118) [0xe067e8]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo12ClientCursorD1Ev+0xf8) [0x9337d8]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo19GlobalCursorIdCache11eraseCursorExb+0x26c) [0x91023c]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo21CollectionCursorCache29eraseCursorGlobalIfAuthorizedEiPx+0x31) [0x9105b1]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo19receivedKillCursorsERNS_7MessageE+0xcf) [0xaa52cf]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x5d0) [0xaaa4e0]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x8b) [0x84955b]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x38c) [0xe2a3fc]
       /lib64/libpthread.so.0(+0x7f18) [0x7f8676c52f18]
       /lib64/libc.so.6(clone+0x6d) [0x7f867591ae0d]
      2014-04-04T22:22:13.943+0000 [conn224] SEVERE: Invalid access at address: 0x17
      2014-04-04T22:22:13.956+0000 [conn224] SEVERE: Got signal: 11 (Segmentation fault).
      Backtrace:0xe63563 0xe630b8 0xe63156 0x7f8676c5a5b0 0x9337e4 0x91023c 0x9105b1 0xaa52cf 0xaaa4e0 0x84955b 0xe2a3fc 0x7f8676c52f18 0x7f867591ae0d
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xe63563]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod() [0xe630b8]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod() [0xe63156]
       /lib64/libpthread.so.0(+0xf5b0) [0x7f8676c5a5b0]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo12ClientCursorD1Ev+0x104) [0x9337e4]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo19GlobalCursorIdCache11eraseCursorExb+0x26c) [0x91023c]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo21CollectionCursorCache29eraseCursorGlobalIfAuthorizedEiPx+0x31) [0x9105b1]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo19receivedKillCursorsERNS_7MessageE+0xcf) [0xaa52cf]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x5d0) [0xaaa4e0]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x8b) [0x84955b]
       /mnt/jenkins/mongodb/unstable/unstable-release/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x38c) [0xe2a3fc]
       /lib64/libpthread.so.0(+0x7f18) [0x7f8676c52f18]
       /lib64/libc.so.6(clone+0x6d) [0x7f867591ae0d]
      

        1. db27017.log
          3.96 MB
        2. SERVER-13495.py
          1 kB

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: