[SERVER-19794] Fatal Assertion 16360: duplicate key error during replication Created: 06/Aug/15 Updated: 31/May/17 Resolved: 03/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shankar Karuppiah | Assignee: | Eric Milkie |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | Linux | ||||||||
| Participants: | |||||||||
| Description |
|
| Comments |
| Comment by Ramon Fernandez Marina [ 07/May/16 ] | |||||||||||||||
|
saisrinivase@gmail.com, for MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. Regards, | |||||||||||||||
| Comment by Sai [ 07/May/16 ] | |||||||||||||||
|
Hi Sachin, Sorry for the delay in response . We didn't upgraded or either down graded the version ,I have restarted re But the major concern is some times unexpectedly mongo services are going When I looked in the error log I didn't see any errors and I am not sure Any clue can you help me guys in this. | |||||||||||||||
| Comment by Sachin Kumar [ 06/May/16 ] | |||||||||||||||
|
Hello Ramon , Thanks for your response , But I wanted to know if anyone who did the upgrade got the issue resolved 100% ., if with the newer version i get the same issue , then there is no point to upgrade . I am facing only this issue rest mongo is doing good for me .. So wanted to be 100% sure before upgrade . Thanks | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 05/May/16 ] | |||||||||||||||
|
sachinrohdia@gmail.com, if you're seeing the behavior described in this ticket it could be due to Thanks, | |||||||||||||||
| Comment by Sachin Kumar [ 05/May/16 ] | |||||||||||||||
|
Hello Sai , Is upgrade to 3.0.11 , resolved this issue ? | |||||||||||||||
| Comment by Sai [ 13/Apr/16 ] | |||||||||||||||
|
Thank you Roman !!! | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 13/Apr/16 ] | |||||||||||||||
|
We were not able to reproduce the behavior described in this ticket, but as pointed out above it could be related to Thanks, | |||||||||||||||
| Comment by Sai [ 13/Apr/16 ] | |||||||||||||||
|
Hi Ramon, Thank you , i have already started the re-sync to make its a secondary. I am glad to know about the FATAL ASSERTION with duplicate key error , because that is the one i am not able to find out . Could you help me on this , appreciate your help !!!! | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 13/Apr/16 ] | |||||||||||||||
|
saisrinivase@gmail.com, it seems this node has become too stale to be able to catch up, and you'll need to resync it. If the problem persists please open a new ticket so we can investigate it separately. Thanks, | |||||||||||||||
| Comment by Sai [ 13/Apr/16 ] | |||||||||||||||
|
Again today it's re-sync failed with the below error , 2016-04-12T20:45:59.697-0400 I REPL [ReplicationExecutor] syncing from: xxxxxxxxxx:27015 016-04-12T20:46:20.811-0400 I QUERY [conn15428] assertion 13436 not master or second 2016-04-12T20:47:15.724-0400 I QUERY [conn15428] assertion 13436 not master or second | |||||||||||||||
| Comment by Shankar Karuppiah [ 12/Apr/16 ] | |||||||||||||||
|
I think this issue is related to https://jira.mongodb.org/browse/SERVER-21275 | |||||||||||||||
| Comment by Sai [ 12/Apr/16 ] | |||||||||||||||
|
2016-03-23T20:15:51.366-0400 I - [repl writer worker 2] Fatal Assertion 16361 Error message as above , please help me on this ..Thanks in advance !!!! | |||||||||||||||
| Comment by Sai [ 11/Apr/16 ] | |||||||||||||||
|
Do we have any solution for this in 3.0.7 version . I would like thanks and appreciate if we have any solution for this error. Since i have started doing re-sync twice but at last it is failing with the same error as posted by Shankar. | |||||||||||||||
| Comment by JongWon Kim [ 03/Nov/15 ] | |||||||||||||||
|
We're experiencing same issues what original author mentioned. I think I can answer to the question.. 1. No We installed 3.0.7 on every node from the beginning, and migrated data from the nodes running 2.4.13. | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 03/Oct/15 ] | |||||||||||||||
|
shankar.k, we haven't heard back from you for a while so we're closing this ticket. If this is still an issue for you please provide the additional information requested by Sam above. Thanks, | |||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 01/Sep/15 ] | |||||||||||||||
|
Sorry for the delay in getting back to you. I've discussed this case with a few of my colleagues and we have three possible explanations for the error that you saw:
In most cases the clock skew should not be a problem, although running a time synchronization service (e.g. ntpd) may be a good idea as a matter of general practice. There are some cases where the combination of clock skew and unreliable networks can produce a replica set with multiple primaries for a short period of time. This is confusing for the client driver, but should lead to a rollback rather than the kind of error that you see. In an attempt to better understand what's happening here, could you answer the following questions:
Sorry again for the delay, and thanks for your help. Regards, | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 12/Aug/15 ] | |||||||||||||||
|
Thanks for uploading the logs shankar.k; this is what I see on the primary:
Right after the delete command the secondary mongodb-alpha-1 becomes unreachable; in the mongodb-alpha-1 logs I see:
Both commands mention the offending objectid, 55c05dbdd896ff11c05bf548, so I'm trying to understand how the delete command may trigger this assertion on the secondary node. I'm very suspicious of the clock difference between these two nodes, more than 10 minutes. I don't know if this is the source of the problem yet or just a coincidence. We'll investigate this further and let you know if we need additional information. I see that the mongodb-alpha-1 successfully transitioned to secondary after a restart; I also see a number of restarts on this system, are you still affected by this issue or are the restarts expected? Thanks, | |||||||||||||||
| Comment by Shankar Karuppiah [ 06/Aug/15 ] | |||||||||||||||
|
Hello Ramon, Thank you for creating the bucket. I have upload the logs For info about load, maybe you checkout our MMS account, organization name is loveclients Thank you, | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 06/Aug/15 ] | |||||||||||||||
|
Here it is: https://10gen-httpsupload.s3.amazonaws.com/upload_forms/e108a7f2-b1d4-492c-96dd-841c1ff25316.html | |||||||||||||||
| Comment by Shankar Karuppiah [ 06/Aug/15 ] | |||||||||||||||
|
Hello Ramon, Could you create a MongoDB Support File Upload bucket, please ? Thank you, | |||||||||||||||
| Comment by Ramon Fernandez Marina [ 06/Aug/15 ] | |||||||||||||||
|
Hi shankar.k, could you please upload full logs for the primary node as well as the affected secondary? Are there other details you can provide as of to how this assertion was triggered (load, type of operations, etc.)? Thanks, |