-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.2.1
-
Component/s: None
-
None
-
Linux
-
We're running a 5-node replica set (including 1 hidden member). This morning, 3 nodes (including the master) all crashed at the same time!
The log file for the primary contains:
Fri Dec 21 15:45:45 [repl writer worker 4] ERROR: writer worker caught exception: E11000 duplicate key error index: songza.song_bans.$station_id_1_song_id_1 dup key: { : 1382507, : 10983572 } on: { ts: Timestamp 1356103732000|9, h: -4292312148168796626, v: 2, op: "i", ns: "songza.song_bans", o:
{ _id: ObjectId('50d4803398896c3d577f54c4'), ban_time: new Date(1356103731894), song_id: 10983572, user_id: 1079677, station_id: 1382507 } }
Fri Dec 21 15:45:45 [repl writer worker 4] Fatal Assertion 16360
0xaf8c41 0xabe223 0x99b83e 0xacc4cd 0xb3ec79 0x7f671505e9ca 0x7f6714405cdd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xabe223]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x13e) [0x99b83e]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xacc4cd]
/usr/bin/mongod() [0xb3ec79]
/lib/libpthread.so.0(+0x69ca) [0x7f671505e9ca]
/lib/libc.so.6(clone+0x6d) [0x7f6714405cdd]
Fri Dec 21 15:45:45 [repl writer worker 4]
***aborting after fassert() failure
Fri Dec 21 15:45:45 Got signal: 6 (Aborted).
Fri Dec 21 15:45:45 Backtrace:
0xaf8c41 0x5586c9 0x7f6714352af0 0x7f6714352a75 0x7f67143565c0 0xabe25e 0x99b83e 0xacc4cd 0xb3ec79 0x7f671505e9ca 0x7f6714405cdd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5586c9]
/lib/libc.so.6(+0x33af0) [0x7f6714352af0]
/lib/libc.so.6(gsignal+0x35) [0x7f6714352a75]
/lib/libc.so.6(abort+0x180) [0x7f67143565c0]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xabe25e]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x13e) [0x99b83e]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xacc4cd]
/usr/bin/mongod() [0xb3ec79]
/lib/libpthread.so.0(+0x69ca) [0x7f671505e9ca]
/lib/libc.so.6(clone+0x6d) [0x7f6714405cdd]
Attempts to restart or repair this node have not been successful. We were able to restart the two crashed secondaries, and one of them eventually stepped up to become the primary.
I'm attaching log files from the three nodes in question. db1a is the hidden member. db1c was the primary (and is the one which will not come back up). db1d was (and continues to be) one of the secondaries.