Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.6
Component/s: Internal Code
Labels:
- crash
- replication
Environment:
Ubuntu 10.04.4
32 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz / 127Gb memory

Operating System:
Linux
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi.

We have RS mongodb:

> rs.conf()
{
	"_id" : "mdb01",
	"version" : 19959,
	"members" : [
		{
			"_id" : 0,
			"host" : "mdb01d:27018"
		},
		{
			"_id" : 1,
			"host" : "mdb01:27018"
		},
		{
			"_id" : 2,
			"host" : "mdb01g:27018"
		},
		{
			"_id" : 3,
			"host" : "mdb-backup01d:27018",
			"votes" : 0,
			"priority" : 0,
			"hidden" : true
		}
	]
}

today, secondary node's from our Replica Set out of order.

Wed Nov 20 21:28:40.921 [conn20] replSet RECOVERING
Wed Nov 20 21:28:40.921 [conn20] replSet info voting yea for mdb01e:27018 (1)
Wed Nov 20 21:28:41.055 [conn7] end connection 1.1.1.1:59411 (28 connections now open)
Wed Nov 20 21:28:41.483 [rsHealthPoll] DBClientCursor::init call() failed
Wed Nov 20 21:28:41.484 [rsHealthPoll] replset info rty-mdb-backup01d:27018 heartbeat failed, retrying
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet info rty-mdb-backup01d:27018 is down (or slow to respond): 
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet member rty-mdb-backup01dt:27018 is now in state DOWN
Wed Nov 20 21:28:41.487 [rsHealthPoll] replSet member rty-mdb01e:27018 is now in state PRIMARY
Wed Nov 20 21:28:41.488 [rsHealthPoll] DBClientCursor::init call() failed
Wed Nov 20 21:28:41.488 [rsHealthPoll] replset info rty-mdb01g:27018 heartbeat failed, retrying
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet info rty-mdb01g:27018 is down (or slow to respond): 
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet member rty-mdb01g:27018 is now in state DOWN
Wed Nov 20 21:28:42.474 [rsBackgroundSync] replSet syncing to: rty-mdb01e:27018
Wed Nov 20 21:28:42.475 [rsSync] replSet still syncing, not yet to minValid optime 528ce54b:20
Wed Nov 20 21:28:42.565 [rsSync] replSet SECONDARY
Wed Nov 20 21:28:42.648 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: fotki-bazinga.onetimeJobs.$activeUniqueIdentifier_1  dup key: { : "refreshCounters_{"userId":100544310,"albumId":377350}" } on: { ts: Timestamp 1384965478000|3, h: -6072474236908557566, v: 2, op: "i", ns: "fotki-bazinga.onetimeJobs", o: { _id: { taskId: "refreshCounters", jobId: 1384965478119 }, scheduleTime: new Date(1384965508119), activeUniqueIdentifier: "refreshCounters_{"userId":100544310,"albumId":377350}", parameters: "{"userId":100544310,"albumId":377350}", status: "ready", workers: {}, priority: 20 } }
Wed Nov 20 21:28:42.648 [repl writer worker 2]   Fatal Assertion 16360
0xc5e916 0xc23273 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xc23273]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
 /usr/bin/mongod() [0xc9bbac]
 /lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
 /lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]
Wed Nov 20 21:28:42.651 [repl writer worker 2] 

***aborting after fassert() failure


Wed Nov 20 21:28:42.651 Got signal: 6 (Aborted).

Wed Nov 20 21:28:42.653 Backtrace:
0xc5e916 0x70c044 0x7f5cb00c7ba0 0x7f5cb00c7b25 0x7f5cb00cb670 0xc232ae 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3c4) [0x70c044]
 /lib/libc.so.6(+0x33ba0) [0x7f5cb00c7ba0]
 /lib/libc.so.6(gsignal+0x35) [0x7f5cb00c7b25]
 /lib/libc.so.6(abort+0x180) [0x7f5cb00cb670]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x9e) [0xc232ae]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
 /usr/bin/mongod() [0xc9bbac]
 /lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
 /lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]

Our RS switched to read only, because we have lost all secondary nodes than fully made the service unusable (

Assignee:: Matt Dannenberg (Inactive)
Reporter:: Andrey Godin
Participants:: Andrey Godin, Andy Schwerin, Matt Dannenberg
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Nov 20 2013 10:29:23 PM UTC
Updated:: Dec 10 2014 11:19:00 PM UTC
Resolved:: Jan 13 2014 06:13:45 PM UTC

Details

Description

Attachments

Activity

People

Dates