[SERVER-29124] Fatal Assertion 16360 Created: 11/May/17 Updated: 09/Feb/18 Resolved: 18/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication, WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bob Lunney | Assignee: | Kelsey Schubert |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Run mongod with replication. |
| Participants: |
| Description |
|
A replica set secondary crashed with the following assertion error:
At the time the primary was under load, servicing 3.6k reads and 132 writes per second. The other replica set secondary was being rebuilt at the time this one crashed. There are multple 'duplicate key' errors in the primary's log, where documents were rejected on insert, but none are the document reported as duplicate by the secondary that crashed. MongoDB 3.2.11 on Amazon Linux, version 2106.03 |
| Comments |
| Comment by Bob Lunney [ 15/Sep/17 ] |
|
Kelsey, This issue has not recurred since the initial report. We have also not had a failover since then, but we're not looking forward to the inevitable, either. Unfortunately, as a Mongo noob, I wasn't aware of the value of the diagnostics directory, and probably destroyed any chance of solving this mystery. I am aware that unique indexes on secondaries somehow relies on WiredTiger's MVCC mechanism, as dumps made from secondaries with mongodump will sometimes have duplicate data that prevents unique index creation. If there is anything else I can do to help please let me know. Otherwise I suggest closing the ticket as unsolvable, since the diagnostic data isn't available. Thanks for your efforts! Bob |
| Comment by Kelsey Schubert [ 15/Sep/17 ] |
|
Hi blunney, We've been working to understand what has happened here, but haven't had much success. Have you encountered this issue since the initial report? Thanks, |
| Comment by Bob Lunney [ 12/May/17 ] |
|
Thomas, Thanks for your help. I have uploaded :
Sadly, I don't have the diagnostic.data files, nor the indexes for the affected collection from the secondaries at the time of the incident. We needed the secondaries back, so the data directory was purged and the secondaries resynced with the primary. I'll know better next time. The failed secondary (rs1) was the primary prior to the fatal assertion error. The new primary (rs2) took over via an automatic failover. Just prior to the automatic failover the other secondary (rs0) was shutdown, data directory purged, and restarted to resync it with the primary (rs1 at the time). Then the failover event occurred, rs2 was elected primary, rs1 crashed, and eventually rs0 began resyncing from rs2. At this point we have rs2 in PRIMARY mode, rs0 in STARTUP2, and rs1 down, i.e. no secondary to fail over to. I let rs0 finish resyncing and transition to SECONDARY mode prior to resyncing rs1. Thanks for your help, and please let me know if there is anymore information I can provide. |
| Comment by Kelsey Schubert [ 12/May/17 ] |
|
Hi blunney, I've created a secure upload portal where you can diagnostic files. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time. To help us investigate this issue, would you please provide the following information?
Thank you for your help, |
| Comment by Bob Lunney [ 11/May/17 ] |
|
Correction: MongoDB 3.2.11 on Amazon Linux, version 2016.03 Addition: Running WiredTiger as the storage engine. |