[SERVER-27573] Member stuck in ROLLBACK after unclean restart Created: 04/Jan/17 Updated: 29/Jul/17 Resolved: 26/Jun/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.11 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Amanpreet Singh | Assignee: | Kelsey Schubert |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | It does not happen every time. 1. Kill primary member by OOMKILL |
| Participants: |
| Description |
|
The primary member of the 3 node replica set was OOM killed and a secondary member was promoted to primary. Upon restart the dead member, it came up but it's stuck in ROLLBACK state with these logs:
The error suggests network issue which is totally incorrect. The servers can access each other just fine:
I can even connect mongo shell to remote server and run queries fine. Plus, if I delete all data and do a full resync, it's able to connect without any issues. |
| Comments |
| Comment by Kelsey Schubert [ 26/Jun/17 ] |
|
Sorry for the delay getting back to you. Unfortunately, we have not been able to reproduce this issue. If this continues to be an issue, I would recommend posting on mongodb-user group or Stack Overflow with the mongodb tag for MongoDB-related support discussion. It's possible that there may be some particular network configuration that is contributing to this issue. Additionally, please note that we have SERVER-20739 open to track work to improve how we handle network errors during rollback, which may improve the behavior you're observing – please feel free to vote for it and watch it for updates. Kind regards, |
| Comment by Amanpreet Singh [ 13/Jan/17 ] |
|
Hey Thomas, Sorry about that. I've attached some other relevant log files as well. Thanks! |
| Comment by Kelsey Schubert [ 12/Jan/17 ] |
|
I took a look at the logs you've provided, but am not seeing rollback described. Would you please ensure that the correct log files have been uploaded? Thank you, |
| Comment by Amanpreet Singh [ 08/Jan/17 ] |
|
Hi @Thomas Schubert, I've attached logs and a screenshot of MMS dashboard to give some context. Thanks! |
| Comment by Kelsey Schubert [ 04/Jan/17 ] |
|
Thanks for the report. So we can continue to investigate, would you please upload the complete logs for the affected mongod and the primary node? Kind regards, |