[SERVER-45343] Timeout waiters in the ReplicaSetMonitor using an explicit timer Created: 02/Jan/20 Updated: 12/Dec/23 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Benjamin Caimano (Inactive) | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Cluster Scalability
|
||||
| Participants: | |||||
| Linked BF Score: | 34 | ||||
| Description |
|
This work was done to the RSM that exists in v4.3/master, but it would be difficult to backport. I suspect the least invasive change we could make would be to use ReplicaSetMonitor::SetState::scheduleWorkAt() to explicitly notify and thus evaluate timeouts for all waiters. |
| Comments |
| Comment by George Wangensteen [ 25/Apr/22 ] |
|
If I understand the ticket correctly, we have the behavior we want in the streamable RSM that's in use by default on 4.4+; it's the 'old' scanning RSM that is an option on 4.4 and default on 4.2 that has the problematic behavior. This ticket documents fixing an issue specifically with the 4.2/"scanning" replica set monitor. The consequence of not-doing this is that we have this 4.2-only BF that is pretty infrequent/happened once in the last 30 days. Because sharding-nyc owns the RSM now, I'll leave it up to them to prioritize if this work is worth-doing or not/decide on a fix. |
| Comment by Lauren Lewis (Inactive) [ 21/Dec/21 ] |
|
We haven’t heard back from you in at least 1 year, so I'm going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. |