[SERVER-55495] Mongos cannot follow new primary after old one was in uninterruptible sleep state Created: 24/Mar/21  Updated: 06/Dec/22  Resolved: 05/Apr/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.23
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding NYC
Operating System: ALL
Participants:

 Description   

See SERVER-55486 for more details.

The idea is that when the disk fails, the primary can spend long time without stepping down. At some point, presumably, it is moved to uninterruptible sleep state. It may or may not be related to the bug, granted. After being killed with SIGKILL the election happens, but mongos remains stuck with the old primary and never recovers unless killed.

The behavior reproduced in 4.0, but could be present in other branches.



 Comments   
Comment by Andrew Shuvalov (Inactive) [ 24/Mar/21 ]

Follow up from production incident HELP-22913

Generated at Thu Feb 08 05:36:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.