Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Incomplete
-
2.4.6
-
None
-
Ubuntu 10.04.4
32 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz / 127Gb memory
-
Linux
Description
Hi.
We have RS mongodb:
> rs.conf()
|
{
|
"_id" : "mdb01",
|
"version" : 19959,
|
"members" : [
|
{
|
"_id" : 0,
|
"host" : "mdb01d:27018"
|
},
|
{
|
"_id" : 1,
|
"host" : "mdb01:27018"
|
},
|
{
|
"_id" : 2,
|
"host" : "mdb01g:27018"
|
},
|
{
|
"_id" : 3,
|
"host" : "mdb-backup01d:27018",
|
"votes" : 0,
|
"priority" : 0,
|
"hidden" : true
|
}
|
]
|
}
|
today, secondary node's from our Replica Set out of order.
Wed Nov 20 21:28:40.921 [conn20] replSet RECOVERING
|
Wed Nov 20 21:28:40.921 [conn20] replSet info voting yea for mdb01e:27018 (1)
|
Wed Nov 20 21:28:41.055 [conn7] end connection 1.1.1.1:59411 (28 connections now open)
|
Wed Nov 20 21:28:41.483 [rsHealthPoll] DBClientCursor::init call() failed
|
Wed Nov 20 21:28:41.484 [rsHealthPoll] replset info rty-mdb-backup01d:27018 heartbeat failed, retrying
|
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet info rty-mdb-backup01d:27018 is down (or slow to respond):
|
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet member rty-mdb-backup01dt:27018 is now in state DOWN
|
Wed Nov 20 21:28:41.487 [rsHealthPoll] replSet member rty-mdb01e:27018 is now in state PRIMARY
|
Wed Nov 20 21:28:41.488 [rsHealthPoll] DBClientCursor::init call() failed
|
Wed Nov 20 21:28:41.488 [rsHealthPoll] replset info rty-mdb01g:27018 heartbeat failed, retrying
|
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet info rty-mdb01g:27018 is down (or slow to respond):
|
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet member rty-mdb01g:27018 is now in state DOWN
|
Wed Nov 20 21:28:42.474 [rsBackgroundSync] replSet syncing to: rty-mdb01e:27018
|
Wed Nov 20 21:28:42.475 [rsSync] replSet still syncing, not yet to minValid optime 528ce54b:20
|
Wed Nov 20 21:28:42.565 [rsSync] replSet SECONDARY
|
Wed Nov 20 21:28:42.648 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: fotki-bazinga.onetimeJobs.$activeUniqueIdentifier_1 dup key: { : "refreshCounters_{"userId":100544310,"albumId":377350}" } on: { ts: Timestamp 1384965478000|3, h: -6072474236908557566, v: 2, op: "i", ns: "fotki-bazinga.onetimeJobs", o: { _id: { taskId: "refreshCounters", jobId: 1384965478119 }, scheduleTime: new Date(1384965508119), activeUniqueIdentifier: "refreshCounters_{"userId":100544310,"albumId":377350}", parameters: "{"userId":100544310,"albumId":377350}", status: "ready", workers: {}, priority: 20 } }
|
Wed Nov 20 21:28:42.648 [repl writer worker 2] Fatal Assertion 16360
|
0xc5e916 0xc23273 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d
|
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
|
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xc23273]
|
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
|
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
|
/usr/bin/mongod() [0xc9bbac]
|
/lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
|
/lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]
|
Wed Nov 20 21:28:42.651 [repl writer worker 2]
|
|
***aborting after fassert() failure
|
|
|
Wed Nov 20 21:28:42.651 Got signal: 6 (Aborted).
|
|
Wed Nov 20 21:28:42.653 Backtrace:
|
0xc5e916 0x70c044 0x7f5cb00c7ba0 0x7f5cb00c7b25 0x7f5cb00cb670 0xc232ae 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d
|
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
|
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3c4) [0x70c044]
|
/lib/libc.so.6(+0x33ba0) [0x7f5cb00c7ba0]
|
/lib/libc.so.6(gsignal+0x35) [0x7f5cb00c7b25]
|
/lib/libc.so.6(abort+0x180) [0x7f5cb00cb670]
|
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x9e) [0xc232ae]
|
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
|
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
|
/usr/bin/mongod() [0xc9bbac]
|
/lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
|
/lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]
|
Our RS switched to read only, because we have lost all secondary nodes than fully made the service unusable (