[SERVER-45343] Timeout waiters in the ReplicaSetMonitor using an explicit timer Created: 02/Jan/20  Updated: 12/Dec/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Benjamin Caimano (Inactive) Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Cluster Scalability
Participants:
Linked BF Score: 34

 Description   

This work was done to the RSM that exists in v4.3/master, but it would be difficult to backport. I suspect the least invasive change we could make would be to use ReplicaSetMonitor::SetState::scheduleWorkAt() to explicitly notify and thus evaluate timeouts for all waiters.



 Comments   
Comment by George Wangensteen [ 25/Apr/22 ]

If I understand the ticket correctly, we have the behavior we want in the streamable RSM that's in use by default on 4.4+; it's the 'old' scanning RSM that is an option on 4.4 and default on 4.2 that has the problematic behavior. This ticket documents fixing an issue specifically with the 4.2/"scanning" replica set monitor. 

The consequence of not-doing this is that we have this 4.2-only BF that is pretty infrequent/happened once in the last 30 days. Because sharding-nyc owns the RSM now, I'll leave it up to them to prioritize if this work is worth-doing or not/decide on a fix. 

Comment by Lauren Lewis (Inactive) [ 21/Dec/21 ]

We haven’t heard back from you in at least 1 year, so I'm going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Generated at Thu Feb 08 05:08:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.