Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical - P2
-
None
-
2.4.6
-
None
-
ubuntu 12.04 on aws
-
Linux
Description
We have a cluster of 3 nodes to enable replication. Two of the nodes are configured with the same priority (1) while the third one is configured with priority 0 so it never gets promoted (it's the one we use for backups).
As of last night, the two secondaries have been failing continuously during writes, with the following error logged on both servers:
Wed Sep 25 05:28:16.211 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: marketshare.Application.$name_1 dup key: { : "TestingRDS" } on: { ts: Timestamp 1380112101000|1, h: -5880003146554201345, v: 2, op: "u", ns: "marketshare.Application", o2:
{ _id: ObjectId('5242be375f6ffb0c37145385') }, o: { $set: { boxes.0:
{ box_type_name: "OPTIMIZERMS-APPV4", updated: "2013-09-25 12:28:22.855168", <rest of object>...}}}
Wed Sep 25 05:28:16.211 [repl writer worker 2] Fatal Assertion 16360
0xdddd81 0xd9dc13 0xc26bfc 0xdab721 0xe26609 0x7ff9f4923e9a 0x7ff9f3c36ccd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd9dc13]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7ff9f4923e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff9f3c36ccd]
Wed Sep 25 05:28:16.215 [repl writer worker 2]
***aborting after fassert() failure
Wed Sep 25 05:28:16.215 Got signal: 6 (Aborted).
Wed Sep 25 05:28:16.219 Backtrace:
0xdddd81 0x6d0d29 0x7ff9f3b794a0 0x7ff9f3b79425 0x7ff9f3b7cb8b 0xd9dc4e 0xc26bfc 0xdab721 0xe26609 0x7ff9f4923e9a 0x7ff9f3c36ccd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7ff9f3b794a0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7ff9f3b79425]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7ff9f3b7cb8b]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd9dc4e]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721]
/usr/bin/mongod() [0xe26609]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7ff9f4923e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7ff9f3c36ccd]
After rebooting both secondaries, the cluster was able to establish quorum again for about 15 minutes, but after that, the issue reproed again. When we logged in to the instances directly, we noticed that que indexes were correct on the primary, but were completely empty on the secondary.