[SERVER-23841] Mongod always complain "Fatal assertion 18750 UnrecoverableRollbackError" after mongod abnormal termination Created: 21/Apr/16 Updated: 08/Feb/23 Resolved: 08/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | 아나 하리 | Assignee: | Backlog - Replication Team |
| Resolution: | Duplicate | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | 1) kill primary mongod of replica-set during crud operation, |
||||||||||||||||
| Sprint: | Repl 15 (06/03/16), Repl 16 (06/24/16) | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
MongoDB server always complain "Fatal assertion 18750" after abnormal mongod termination. Sometimes mongod server is hang and no response, at that time mongod server almost not response (
Is this normal behavior or there might be something wrong on my configuration. I found a few bug-report from JIRA, but they say it's already fixed. Thanks. |
| Comments |
| Comment by Eric Milkie [ 08/Sep/16 ] | |||||||||||||||||||
|
This problem is now fixed by the work in | |||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 22/Jul/16 ] | |||||||||||||||||||
|
This is really a duplicate of | |||||||||||||||||||
| Comment by Benety Goh [ 09/Jun/16 ] | |||||||||||||||||||
|
Hi matt.lee, After examining the logs, it appears shard01-mongo3 tried to sync from another secondary (shard01-mongo2) that has not yet sync'ed up its oplog with the new primary (shard01-mongo1). The crash resulted because shard01-mongo3 contained internal post-rollback state that was inconsistent with the secondary it was trying to sync from. We recognize that this is a defect with the server and the work that will address this issue is described in If you restart shard01-mongo3, it should be able to find two consistent nodes to sync from. I'll close this ticket as a duplicate of Regards, | |||||||||||||||||||
| Comment by Kelsey Schubert [ 12/May/16 ] | |||||||||||||||||||
|
Hi matt.lee, Thank you for providing the logs files. We have what we need to debug this issue so I am sending this ticket to the replication team. Please continue to watch for updates. Kind regards, | |||||||||||||||||||
| Comment by 아나 하리 [ 12/May/16 ] | |||||||||||||||||||
|
Hi Thomas Schubert. I have experienced the same issue on stable replica-set.
I stepped down primary(shard01-mongo1) with changing priority, at this time everything's fine.
After 1 minutes, I stepped down primary (shard01-mongo2) with changing priority. But this time shard01-mongo3 could not make it.
Please check the attached log files. | |||||||||||||||||||
| Comment by 아나 하리 [ 25/Apr/16 ] | |||||||||||||||||||
|
Looks like this issue is caused by Thanks. | |||||||||||||||||||
| Comment by 아나 하리 [ 22/Apr/16 ] | |||||||||||||||||||
|
Hi Thomas Schubert. 1. Yes
2-1. This is test environment, And there's no backup. So I use only MongoDB 3.2.5 Anyway, Is it correct that MongoDB server can find last location of oplog and replicate remain operations and join replica-set even though that node is crashed(even if there's rollback) (if journal is enabled and if disk is not corrupted) ? Thanks. | |||||||||||||||||||
| Comment by Kelsey Schubert [ 21/Apr/16 ] | |||||||||||||||||||
|
Hi matt.lee, Thank you for reporting this behavior. To help us investigate this issue, please answer the following questions:
To recover, I would recommend either completing an initial sync or restoring from a backup. Kind regards, |