[SERVER-9004] mongodb crashed with E11000 duplicate key error Created: 17/Mar/13 Updated: 18/Mar/13 Resolved: 18/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.2.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Roy Smith | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu Precise |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
A hidden node in one of our replica sets crashed. The last message in the log file was: Sun Mar 17 03:08:48 [slaveTracking] update local.slaves query: { _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }update: { $set: { syncedTo: Timestamp 1363489723000|489 }} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107758 107ms The server restarted when I ran "service mongodb start" Possibly related to |
| Comments |
| Comment by Roy Smith [ 18/Mar/13 ] |
|
OK, cool. Thanks for your help. |
| Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ] |
|
dup of |
| Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ] |
|
The issue is with how slaves are tracked for replication and this error is related to having more than one slave with the same _id (like if you cloned a machine/dbpath). This can happen for various reasons and is quickly remedied. Please drop the "slaves" collection from the local db and it will be recreated. This collection is a cache of information from the active members of the replica set. |
| Comment by Roy Smith [ 18/Mar/13 ] |
|
Yeah, that turns out to be exactly what it was (for both crashes): [2013-03-17 03:09:59] Killed process 4721 (mongod) We'll reconfigure the machine to have more swap today. I'm still confused about the E11000, though. How is it possible to get a duplicate key on a hidden secondary? The only inserts that happen on this machine are replication from the primary. Looking over the logs, I see another example (with the same key), a couple of days ago: Sun Mar 17 03:08:48 [slaveTracking] update local.slaves query: { _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }update: { $set: { syncedTo: Timestamp 1363489723000|489 }} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107758 107ms Fri Mar 15 07:48:01 [slaveTracking] update local.slaves query: { _id: ObjectId('501ad1104953cf639e791e62'), host: "10.29.211.199", ns: "local.oplog.rs" }update: { $set: { syncedTo: Timestamp 1363333679000|100 }} nscanned:1 fastmodinsert:1 keyUpdates:0 exception: E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('501ad1104953cf639e791e62') } code:11000 locks(micros) w:107528 171ms |
| Comment by Scott Hernandez (Inactive) [ 18/Mar/13 ] |
|
Roy, can you please check your system logs (/var/log/messages or syslog) for any mongod related messages? This sounds like the linux OOM Killer. |
| Comment by Roy Smith [ 18/Mar/13 ] |
|
Things are not looking good. The same server just crashed again. This time, there's no mention of an E11000 in the log. I'm uploading the latest log file (under the name mongodb.log-2). I restarted the server. It came back up and settle into state SECONDARY (as it should). |