[SERVER-13953] Duplicate key error causes all secondaries to fassert() and crash in production Created: 15/May/14 Updated: 10/Dec/14 Resolved: 15/May/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.4.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Jiangcheng Wu | Assignee: | Thomas Rueckstiess |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Cluster = 1 PRIMARY + 3 SECONDARY + 1 ARBITER I saw some duplicated key error in mongodb.log sometimes, but this was first time they cashed. mongodb logs: Wed May 14 15:01:58.457 [repl writer worker 1] ERROR: writer worker caught exception: E11000 duplicate key error index: vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.Dongxi.$dongxiID_1 dup key: { : null } on: { ts: Timestamp 1400050387000|17, h: -1650836744667758, v: 2, op: "i", ns: "vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.system.indexes", o: { name: "dongxiID_1", ns: "vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.Dongxi", background: true, unique: true, key: { dongxiID: 1 } } } ***aborting after fassert() failure Wed May 14 15:01:58.459 Got signal: 6 (Aborted). Wed May 14 15:01:58.461 Backtrace: } } ***aborting after fassert() failure Wed May 14 14:53:08.748 Got signal: 6 (Aborted). Wed May 14 14:53:08.750 Backtrace: SECONDARY3: Wed May 14 14:53:07.353 [repl writer worker 13] ERROR: writer worker caught exception: E11000 duplicate key error index: vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.Dongxi.$dongxiID_1 dup key: { : null } on: { ts: Timestamp 1400050387000|17, h: -1650836744667758, v: 2, op: "i", ns: "vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.system.indexes", o: { name: "dongxiID_1", ns: "vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx.Dongxi", background: true, unique: true, key: { dongxiID: 1 } } } ***aborting after fassert() failure Wed May 14 14:53:07.409 Got signal: 6 (Aborted). Wed May 14 14:53:07.421 Backtrace: I've attached db files of vo5xzp29vrev6q5e49nof23a7lljzgexvqm0u95p19jjysfx. |
| Comments |
| Comment by Thomas Rueckstiess [ 15/May/14 ] | |||||||||||||||||||||||||||||||||
|
Hi Jiangcheng, Is it possible that you attempted to build the index multiple times (e.g. ran the call to ensureIndex() twice?) Our current hypothesis is that you're experiencing a known bug To fix your secondaries, the best choice is to resync each of them one by one from the primary. I would also recommend that you upgrade to the latest stable version in the 2.4 series (currently 2.4.10) before attempting to build the index again. As this appears to be a duplicate of Regards, | |||||||||||||||||||||||||||||||||
| Comment by Jiangcheng Wu [ 15/May/14 ] | |||||||||||||||||||||||||||||||||
|
db data dump from PRIMARY | |||||||||||||||||||||||||||||||||
| Comment by Jiangcheng Wu [ 15/May/14 ] | |||||||||||||||||||||||||||||||||
|
1, indexes on PRIMARY:
2,PRIMARY mongodb log:
3, it was from SECONDARY3, and I have attached data from PRIMARY If you need anything , please let me know, thank you. | |||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 15/May/14 ] | |||||||||||||||||||||||||||||||||
|
Hi Jiangcheng, It looks like you were trying to build a unique index on the field dongxiID, but there are many duplicate values so the index build failed. I'd like to find out why it succeeded on the primary though and was replicated via the oplog. To continue the diagnosis I would need some more information. 1. Can you please run the following command to get the indexes on the Dongxi collection on the primary (the node that did not crash):
2. We also need to look at the log file of the primary node. Can you please attach the log file reaching back to before you started the index build? Thanks, |