[SERVER-9004] mongodb crashed with E11000 duplicate key error Created: 17/Mar/13  Updated: 18/Mar/13  Resolved: 18/Mar/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Roy Smith Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu Precise
Linux db-events4.songza.com 2.6.32-350-ec2 #56-Ubuntu SMP Mon Oct 22 17:46:54 UTC 2012 x86_64 GNU/Linux


Attachments: Text File mongodb.log     File mongodb.log-2    
Issue Links:
Duplicate
duplicates SERVER-4473 duplicate key error on local.slaves Closed
Operating System: ALL
Participants:

 Description   

A hidden node in one of our replica sets crashed. The last message in the log file was:

Sun Mar 17 03:08:48 [slaveTracking] update local.slaves query:

{ _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }

update: { $set:

{ syncedTo: Timestamp 1363489723000|489 }

} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107758 107ms

The server restarted when I ran "service mongodb start"

Possibly related to SERVER-8012



 Comments   
Comment by Roy Smith [ 18/Mar/13 ]

OK, cool. Thanks for your help.

Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ]

dup of SERVER-4473

Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ]

The issue is with how slaves are tracked for replication and this error is related to having more than one slave with the same _id (like if you cloned a machine/dbpath). This can happen for various reasons and is quickly remedied. Please drop the "slaves" collection from the local db and it will be recreated. This collection is a cache of information from the active members of the replica set.

Comment by Roy Smith [ 18/Mar/13 ]

Yeah, that turns out to be exactly what it was (for both crashes):

[2013-03-17 03:09:59] Killed process 4721 (mongod)
[2013-03-18 03:19:29] Killed process 7534 (mongod)

We'll reconfigure the machine to have more swap today.

I'm still confused about the E11000, though. How is it possible to get a duplicate key on a hidden secondary? The only inserts that happen on this machine are replication from the primary. Looking over the logs, I see another example (with the same key), a couple of days ago:

Sun Mar 17 03:08:48 [slaveTracking] update local.slaves query:

{ _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }

update: { $set:

{ syncedTo: Timestamp 1363489723000|489 }

} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107758 107ms

Fri Mar 15 07:48:01 [slaveTracking] update local.slaves query:

{ _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }

update: { $set:

{ syncedTo: Timestamp 1363333679000|100 }

} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107528 171ms

Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ]

Roy, can you please check your system logs (/var/log/messages or syslog) for any mongod related messages? This sounds like the linux OOM Killer.

Comment by Roy Smith [ 18/Mar/13 ]

Things are not looking good. The same server just crashed again. This time, there's no mention of an E11000 in the log. I'm uploading the latest log file (under the name mongodb.log-2).

I restarted the server. It came back up and settle into state SECONDARY (as it should).

Generated at Thu Feb 08 03:19:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.