[DOCS-15060] Investigate changes in SERVER-56756: Primary cannot stepDown when experiencing disk failures Created: 24/Jan/22 Updated: 13/Nov/23 Resolved: 23/Feb/22 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | manual, Server |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Backlog - Core Eng Program Management Team | Assignee: | Jocelyn Mendez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Days since reply: | 1 year, 50 weeks, 1 day ago | ||||||||
| Epic Link: | DOCSP-19447 | ||||||||
| Description |
|
Downstream Change Summary We are adding parameter fassertOnLockTimeoutForStepUpDown which controls whether we will fassert the server if we time out getting lock for a Step Up or a Step Down command. This allows for a cluster to elect a new primary in rare error conditions, such as a disk failure. For more information please look at related Description of Linked TicketSending a step down request to a primary that is experiencing disk failures could result in consistent time-out errors:
The error is returned from here and the behavior is easy to reproduce. I've tested the behavior on v4.0.23. Also, I tried to attach GDB to the primary to collect stack-traces, but GDB hangs and I haven't been able to find an alternative yet. |
| Comments |
| Comment by Githook User [ 22/Feb/22 ] |
|
Author: {'name': 'jocelyn-mendez1', 'email': '91144778+jocelyn-mendez1@users.noreply.github.com', 'username': 'jocelyn-mendez1'}Message:
Co-authored-by: Jocelyn Mendez <jocelyn.mendez@Jocelyns-MacBook-Pro.local> |
| Comment by PM Bot [ 24/Jan/22 ] |
|
Downstream changes updated for upstream |