[SERVER-14897] transient but crashing ERROR: writer worker caught exception: E11000 duplicate key error index situation Created: 14/Aug/14  Updated: 14/Nov/14  Resolved: 14/Nov/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Samuele Pedroni Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File LOG1     HTML File LOG2     HTML File LOG3    
Operating System: ALL
Participants:

 Description   

We have a setup with one 3-replica replicaset; during a series of successive restarts (coming from our deployment infra) at some point the secondaries went both down with:

ERROR: writer worker caught exception: E11000 duplicate key error index ...

attaching the logs of the 3 replica around the event. LOG2 and LOG3 are the crashing secondaries, LOG1 is the continuing to run original primary.

That data referenced there should have been already in the system, there could have been though a insert at that time going on that should have just produced a dup error to the client side.

After a while we restarted the 3 replica manually and they came up properly, we checked if actual dup data was present but that wasn't the case.

As can be seen from the logs this involves a multikey unique index.

This may like a still present corner case of SERVER-6671 .



 Comments   
Comment by Ramon Fernandez Marina [ 14/Nov/14 ]

pedronis, since we haven't heard back from you for some time we're closing this ticket. If this is still an issue for you feel free to re-open the ticket and provide the additional information requested above.

Regards,
Ramón.

Comment by Samuele Pedroni [ 01/Sep/14 ]

> db.tokens.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "push.tokens",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"shardkey" : 1,
			"userid" : 1,
			"devid" : 1,
			"appid" : 1
		},
		"unique" : true,
		"ns" : "push.tokens",
		"name" : "shardkey_1_userid_1_devid_1_appid_1"
	}
]

EnsureIndex for the extra index is run at various services startup atm

Comment by Ramon Fernandez Marina [ 25/Aug/14 ]

Can you post the indexes for your "tokens" collection in the "push" database from all your nodes?

use push
db.tokens.getIndexes()

Also I'm afraid we'll need more information on how those indexes were built, as the logs do not contain such information. This looks very similar to SERVER-9293.

Thanks,
Ramón.

Comment by Samuele Pedroni [ 20/Aug/14 ]

On Tue, Aug 19, 2014 at 9:02 PM, Ramon Fernandez (JIRA) <jira@mongodb.org>

when they start up, they were likely also bouncing back around the same
time, is that a form of what you are referring and of that bug?

Comment by Ramon Fernandez Marina [ 19/Aug/14 ]

pedronis, the logs seem to be missing the fassert() stack traces, which could provide more information on whether this is data-related or index-related. The reason I'm asking is because this could be an instance of SERVER-12662, fixed for 2.4.10. Are you using background index builds? Is there any further information you can provide?

Generated at Thu Feb 08 03:36:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.