[SERVER-19633] Fatal Assertion 16360 during initial sync Created: 28/Jul/15 Updated: 25/Nov/15 Resolved: 10/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.9, 2.6.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | John Fisher | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
After provisioning a new database server, and adding it to our existing replica set, we got this error during the initial sync. A large portion of the data transferred successfully before we got the error. The duplicate key error mentioned in the logs does not exist on the primary. The index in question (on Primary) is:
The pre-existing Primary and Secondary hosts are running 2.6.9, but the new server (on which the error occurs) is running 2.6.10. We've tried to bring this new host up twice, and got the same error both times. (Though on different objects)
|
| Comments |
| Comment by Ramon Fernandez Marina [ 10/Oct/15 ] |
|
jr@datafox.co, thanks for uploading the logs and for your patience. Unfortunately we haven't been able to ascertain that it was a bug in mongod that caused this behavior. This comment in A possible workaround is to resync this secondary node. Depending on the size of the data this may be a long operation, so one thing you may want to try first is to rebuild the linkedin_ids index on the secondary node first. On the off chance that this issue was caused by a bug in v0 indexes, you may want to consider rebuilding this index on the primary as well to update it to v1. Sorry you run into this, and I hope the workaround provided above allows you to get you back up and running. Regards, |
| Comment by Ramon Fernandez Marina [ 31/Aug/15 ] |
|
Hi jr@datafox.co, apologies for the radio silence. This is to let you know that we're still investigating this issue alongside Thanks for your patience, |
| Comment by John Fisher [ 30/Jul/15 ] |
|
Here is the full log from the replication fatal assertion issue. |
| Comment by Ramon Fernandez Marina [ 30/Jul/15 ] |
|
jr@datafox.co, which version of MongoDB was this index created with? As you may have noticed, the version number is 0, which corresponds to an old version; if you were to create an index with 2.6.9 the version would be 1. I'm asking because this may be the source of the issue: when I try to reproduce this assertion and I create a 2-node replica set running 2.6.9, a v0 index does replicate as a v0 index – but when I add a 2.6.10 node the index is built as v1. While I haven't been able to trigger the assertion I'm suspecting index versions may be the issue. I'd like to get full logs for the 2.6.10 node since startup until it aborts replication and shuts down. If you don't have them but are willing to repeat the experiment I'd like to ask you to start mongod with -vv to get more verbose logs. Is this something you can do to help us track down the source of the problem? Thanks, |
| Comment by John Fisher [ 30/Jul/15 ] |
|
Any updates? |