Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20866

Race condition in oplog insert transaction rollback

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.2
    • Fix Version/s: 3.0.9
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      QuInt E (01/11/16)

      Description

      Issue Status as of Dec 03, 2015

      ISSUE SUMMARY
      When using the MMAPv1 storage engine, a race condition during the transaction rollback of oplog insertions may result in a crash.

      USER IMPACT
      MongoDB may crash and corrupt oplog. Primary re-election may not occur. Heavy insert loads increase the likelihood that this race condition may impact the system.

      This behavior has been observed when balancing a collection that has large objects and when the chunk migration "source" mongod is under heavy load.

      WORKAROUNDS
      There are no workarounds for this issue.

      Following a crash execute

      db.shutdownServer({force:true})

      to ensure a reelection occurs.

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.8

      FIX VERSION
      The fix is included in the 3.0.9 release.

      Original Description
      After we added a shard, we got this crash 3 times. The node becomes useless, and we have to manually elect a new primary and restore the data.

      After investigation we found out the the oplog collection is corrupt (validate() returns an error).

      This is the crash:

      mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6b4a9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf0bca1]
       mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xef0eff]
       mongod(+0xAF0FAC) [0xef0fac]
       mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3DF) [0x84feff]
       mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xB8) [0x92de38]
       mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x518) [0xd621b8]
       mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKNS_9DocWriterEb+0x76) [0xd5b0f6]
       mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextEPKNS_9DocWriterEb+0x5C) [0x92e8dc]
       mongod(+0x8550A6) [0xc550a6]
       mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x161) [0xc537b1]
       mongod(_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE+0x6AC) [0xa567fc]
       mongod(_ZN5mongo11UpdateStage4workEPm+0x3E5) [0xa56fb5]
       mongod(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0xA4) [0xbed174]
       mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x34) [0xbed524]
       mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x3D) [0xbedb5d]
       mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x71D) [0x9ce03d]
       mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestERKNS_19WriteConcernOptionsEPSt6vectorIPNS_19BatchedUpsertDetailESaIS9_EEPS7_IPNS_16WriteErrorDetailESaISE_EE+0x23C) [0x9cf82c]
       mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x37B) [0x9cfd6b]
       mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x9d270d]
       mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9f5e34]
       mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9f6dbd]
       mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9f7acb]
       mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x746) [0xbbb836]
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xad2190]
       mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x829cad]
       mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf1f09b]
       libpthread.so.0(+0x7DF3) [0x7f36287e9df3]
       libc.so.6(clone+0x6D) [0x7f362729d1bd]
      -----  END BACKTRACE  -----
      

        Attachments

        1. another_crash
          18 kB
        2. crash_308_debug
          218 kB
        3. monogo_crash.zip
          4.33 MB
        4. perfect_example
          517 kB

          Activity

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: