Details
Description
ISSUE SUMMARY
When using the MMAPv1 storage engine, a race condition during the transaction rollback of oplog insertions may result in a crash.
USER IMPACT
MongoDB may crash and corrupt oplog. Primary re-election may not occur. Heavy insert loads increase the likelihood that this race condition may impact the system.
This behavior has been observed when balancing a collection that has large objects and when the chunk migration "source" mongod is under heavy load.
WORKAROUNDS
There are no workarounds for this issue.
Following a crash execute
db.shutdownServer({force:true})
|
AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.8
FIX VERSION
The fix is included in the 3.0.9 release.
Original Description
After we added a shard, we got this crash 3 times. The node becomes useless, and we have to manually elect a new primary and restore the data.
After investigation we found out the the oplog collection is corrupt (validate() returns an error).
This is the crash:
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6b4a9]
|
mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf0bca1]
|
mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xef0eff]
|
mongod(+0xAF0FAC) [0xef0fac]
|
mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3DF) [0x84feff]
|
mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xB8) [0x92de38]
|
mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x518) [0xd621b8]
|
mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKNS_9DocWriterEb+0x76) [0xd5b0f6]
|
mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextEPKNS_9DocWriterEb+0x5C) [0x92e8dc]
|
mongod(+0x8550A6) [0xc550a6]
|
mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x161) [0xc537b1]
|
mongod(_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE+0x6AC) [0xa567fc]
|
mongod(_ZN5mongo11UpdateStage4workEPm+0x3E5) [0xa56fb5]
|
mongod(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0xA4) [0xbed174]
|
mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x34) [0xbed524]
|
mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x3D) [0xbedb5d]
|
mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x71D) [0x9ce03d]
|
mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestERKNS_19WriteConcernOptionsEPSt6vectorIPNS_19BatchedUpsertDetailESaIS9_EEPS7_IPNS_16WriteErrorDetailESaISE_EE+0x23C) [0x9cf82c]
|
mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x37B) [0x9cfd6b]
|
mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x9d270d]
|
mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9f5e34]
|
mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9f6dbd]
|
mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9f7acb]
|
mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x746) [0xbbb836]
|
mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xad2190]
|
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x829cad]
|
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf1f09b]
|
libpthread.so.0(+0x7DF3) [0x7f36287e9df3]
|
libc.so.6(clone+0x6D) [0x7f362729d1bd]
|
----- END BACKTRACE -----
|