[SERVER-55495] Mongos cannot follow new primary after old one was in uninterruptible sleep state Created: 24/Mar/21 Updated: 06/Dec/22 Resolved: 05/Apr/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.23 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding NYC |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding NYC
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
See The idea is that when the disk fails, the primary can spend long time without stepping down. At some point, presumably, it is moved to uninterruptible sleep state. It may or may not be related to the bug, granted. After being killed with SIGKILL the election happens, but mongos remains stuck with the old primary and never recovers unless killed. The behavior reproduced in 4.0, but could be present in other branches. |
| Comments |
| Comment by Andrew Shuvalov (Inactive) [ 24/Mar/21 ] |
|
Follow up from production incident HELP-22913 |