[SERVER-14897] transient but crashing ERROR: writer worker caught exception: E11000 duplicate key error index situation Created: 14/Aug/14 Updated: 14/Nov/14 Resolved: 14/Nov/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Samuele Pedroni | Assignee: | Ramon Fernandez Marina |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We have a setup with one 3-replica replicaset; during a series of successive restarts (coming from our deployment infra) at some point the secondaries went both down with: ERROR: writer worker caught exception: E11000 duplicate key error index ... attaching the logs of the 3 replica around the event. LOG2 and LOG3 are the crashing secondaries, LOG1 is the continuing to run original primary. That data referenced there should have been already in the system, there could have been though a insert at that time going on that should have just produced a dup error to the client side. After a while we restarted the 3 replica manually and they came up properly, we checked if actual dup data was present but that wasn't the case. As can be seen from the logs this involves a multikey unique index. This may like a still present corner case of |
| Comments |
| Comment by Ramon Fernandez Marina [ 14/Nov/14 ] | |||||||||||||||||||||||
|
pedronis, since we haven't heard back from you for some time we're closing this ticket. If this is still an issue for you feel free to re-open the ticket and provide the additional information requested above. Regards, | |||||||||||||||||||||||
| Comment by Samuele Pedroni [ 01/Sep/14 ] | |||||||||||||||||||||||
EnsureIndex for the extra index is run at various services startup atm | |||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/Aug/14 ] | |||||||||||||||||||||||
|
Can you post the indexes for your "tokens" collection in the "push" database from all your nodes?
Also I'm afraid we'll need more information on how those indexes were built, as the logs do not contain such information. This looks very similar to Thanks, | |||||||||||||||||||||||
| Comment by Samuele Pedroni [ 20/Aug/14 ] | |||||||||||||||||||||||
|
On Tue, Aug 19, 2014 at 9:02 PM, Ramon Fernandez (JIRA) <jira@mongodb.org> when they start up, they were likely also bouncing back around the same | |||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 19/Aug/14 ] | |||||||||||||||||||||||
|
pedronis, the logs seem to be missing the fassert() stack traces, which could provide more information on whether this is data-related or index-related. The reason I'm asking is because this could be an instance of |