[SERVER-11802] duplicate key exception in replication Created: 20/Nov/13  Updated: 10/Dec/14  Resolved: 13/Jan/14

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrey Godin Assignee: Matt Dannenberg
Resolution: Incomplete Votes: 0
Labels: crash, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04.4
32 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz / 127Gb memory


Operating System: Linux
Participants:

 Description   

Hi.

We have RS mongodb:

> rs.conf()
{
	"_id" : "mdb01",
	"version" : 19959,
	"members" : [
		{
			"_id" : 0,
			"host" : "mdb01d:27018"
		},
		{
			"_id" : 1,
			"host" : "mdb01:27018"
		},
		{
			"_id" : 2,
			"host" : "mdb01g:27018"
		},
		{
			"_id" : 3,
			"host" : "mdb-backup01d:27018",
			"votes" : 0,
			"priority" : 0,
			"hidden" : true
		}
	]
}

today, secondary node's from our Replica Set out of order.

Wed Nov 20 21:28:40.921 [conn20] replSet RECOVERING
Wed Nov 20 21:28:40.921 [conn20] replSet info voting yea for mdb01e:27018 (1)
Wed Nov 20 21:28:41.055 [conn7] end connection 1.1.1.1:59411 (28 connections now open)
Wed Nov 20 21:28:41.483 [rsHealthPoll] DBClientCursor::init call() failed
Wed Nov 20 21:28:41.484 [rsHealthPoll] replset info rty-mdb-backup01d:27018 heartbeat failed, retrying
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet info rty-mdb-backup01d:27018 is down (or slow to respond): 
Wed Nov 20 21:28:41.485 [rsHealthPoll] replSet member rty-mdb-backup01dt:27018 is now in state DOWN
Wed Nov 20 21:28:41.487 [rsHealthPoll] replSet member rty-mdb01e:27018 is now in state PRIMARY
Wed Nov 20 21:28:41.488 [rsHealthPoll] DBClientCursor::init call() failed
Wed Nov 20 21:28:41.488 [rsHealthPoll] replset info rty-mdb01g:27018 heartbeat failed, retrying
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet info rty-mdb01g:27018 is down (or slow to respond): 
Wed Nov 20 21:28:41.490 [rsHealthPoll] replSet member rty-mdb01g:27018 is now in state DOWN
Wed Nov 20 21:28:42.474 [rsBackgroundSync] replSet syncing to: rty-mdb01e:27018
Wed Nov 20 21:28:42.475 [rsSync] replSet still syncing, not yet to minValid optime 528ce54b:20
Wed Nov 20 21:28:42.565 [rsSync] replSet SECONDARY
Wed Nov 20 21:28:42.648 [repl writer worker 2] ERROR: writer worker caught exception: E11000 duplicate key error index: fotki-bazinga.onetimeJobs.$activeUniqueIdentifier_1  dup key: { : "refreshCounters_{"userId":100544310,"albumId":377350}" } on: { ts: Timestamp 1384965478000|3, h: -6072474236908557566, v: 2, op: "i", ns: "fotki-bazinga.onetimeJobs", o: { _id: { taskId: "refreshCounters", jobId: 1384965478119 }, scheduleTime: new Date(1384965508119), activeUniqueIdentifier: "refreshCounters_{"userId":100544310,"albumId":377350}", parameters: "{"userId":100544310,"albumId":377350}", status: "ready", workers: {}, priority: 20 } }
Wed Nov 20 21:28:42.648 [repl writer worker 2]   Fatal Assertion 16360
0xc5e916 0xc23273 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xc23273]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
 /usr/bin/mongod() [0xc9bbac]
 /lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
 /lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]
Wed Nov 20 21:28:42.651 [repl writer worker 2] 
 
***aborting after fassert() failure
 
 
Wed Nov 20 21:28:42.651 Got signal: 6 (Aborted).
 
Wed Nov 20 21:28:42.653 Backtrace:
0xc5e916 0x70c044 0x7f5cb00c7ba0 0x7f5cb00c7b25 0x7f5cb00cb670 0xc232ae 0xb14021 0xc30b59 0xc9bbac 0x7f5cb0dd69ca 0x7f5cb017d21d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc5e916]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3c4) [0x70c044]
 /lib/libc.so.6(+0x33ba0) [0x7f5cb00c7ba0]
 /lib/libc.so.6(gsignal+0x35) [0x7f5cb00c7b25]
 /lib/libc.so.6(abort+0x180) [0x7f5cb00cb670]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x9e) [0xc232ae]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x121) [0xb14021]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x279) [0xc30b59]
 /usr/bin/mongod() [0xc9bbac]
 /lib/libpthread.so.0(+0x69ca) [0x7f5cb0dd69ca]
 /lib/libc.so.6(clone+0x6d) [0x7f5cb017d21d]

Our RS switched to read only, because we have lost all secondary nodes than fully made the service unusable (



 Comments   
Comment by Andy Schwerin [ 22/Nov/13 ]

airesp, if you still have the data files from any of the secondaries, could you bring one up in standalone mode and post the result of a query that matches?

{ activeUniqueIdentifier: "refreshCounters_{\"userId\":100544310,\"albumId\":377350}" }

Generated at Thu Feb 08 03:26:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.