ISSUE SUMMARY
When using the MMAPv1 storage engine, a race condition during the transaction rollback of oplog insertions may result in a crash.
USER IMPACT
MongoDB may crash and corrupt oplog. Primary re-election may not occur. Heavy insert loads increase the likelihood that this race condition may impact the system.
This behavior has been observed when balancing a collection that has large objects and when the chunk migration "source" mongod is under heavy load.
WORKAROUNDS
There are no workarounds for this issue.
Following a crash execute
db.shutdownServer({force:true})
to ensure a reelection occurs.
AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.8
FIX VERSION
The fix is included in the 3.0.9 release.
Original Description
After we added a shard, we got this crash 3 times. The node becomes useless, and we have to manually elect a new primary and restore the data.
After investigation we found out the the oplog collection is corrupt (validate() returns an error).
This is the crash:
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6b4a9] mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf0bca1] mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xef0eff] mongod(+0xAF0FAC) [0xef0fac] mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3DF) [0x84feff] mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xB8) [0x92de38] mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x518) [0xd621b8] mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKNS_9DocWriterEb+0x76) [0xd5b0f6] mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextEPKNS_9DocWriterEb+0x5C) [0x92e8dc] mongod(+0x8550A6) [0xc550a6] mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x161) [0xc537b1] mongod(_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE+0x6AC) [0xa567fc] mongod(_ZN5mongo11UpdateStage4workEPm+0x3E5) [0xa56fb5] mongod(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0xA4) [0xbed174] mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x34) [0xbed524] mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x3D) [0xbedb5d] mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x71D) [0x9ce03d] mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestERKNS_19WriteConcernOptionsEPSt6vectorIPNS_19BatchedUpsertDetailESaIS9_EEPS7_IPNS_16WriteErrorDetailESaISE_EE+0x23C) [0x9cf82c] mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x37B) [0x9cfd6b] mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x9d270d] mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9f5e34] mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9f6dbd] mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9f7acb] mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x746) [0xbbb836] mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xad2190] mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x829cad] mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf1f09b] libpthread.so.0(+0x7DF3) [0x7f36287e9df3] libc.so.6(clone+0x6D) [0x7f362729d1bd] ----- END BACKTRACE -----