[SERVER-26271] UnrecoverableRollbackError after election Created: 23/Sep/16 Updated: 30/Sep/16 Resolved: 27/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.2.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Maziyar Panahi | Assignee: | Kelsey Schubert |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Hello, Last night the primary node stepped down for reason I don't know and a secondary became a primary. The former primary immediately displayed this error:
I tried this morning to start mongod on former primary but the same error appeared after few seconds. I had this problem before but it was due to ungraceful shutdown and bad RAID array. But in this replica set, each node is on a RAID 10 SSD with lots of memory and cpu. Journaling is enabled on both mongod and file system. And again there was no ungraceful shutdown it just happened during the election even though the primary had the highest priority. Also, I just finished a 2-day mongod --repair on the primary before this incident and I did rsync the disk image to other secondaries. So they were all identical with no corruption. I attached logs, diagnostics and WiredTiger file for the former primary and a new primary. The incident happened at "2016-09-23T04:10". Please let me know if you need anything else.
Log from new primary:
|
| Comments |
| Comment by Maziyar Panahi [ 30/Sep/16 ] |
|
Hi Thomas, Thanks for your reply, I was guessing the same thing. I did disable the chain sync for the time being and also resync the affected node from a good source by rsync. Now all the 3 nodes are working properly with no issue after one week. I will keep track of SERVER-25848 and wait for version 3.4 Many thanks Thomas and have a great day, Cheers, |
| Comment by Kelsey Schubert [ 27/Sep/16 ] |
|
Hi maziyar, Thank you for very detailed report and including the relevant logs. I have examined the logs and identified that this issue will be resolved in MongoDB 3.4 by In the interim, I would recommend disabling chained replication as we have observed that this mitigates the issue. To resolve this fatal assertion, please execute an clean resync on the affected secondary. Kind regards, |