[SERVER-84709] Resharding critical section timeout is not honored on stepdown Created: 09/Jan/24  Updated: 23/Jan/24

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Allison Easton Assignee: Adi Zaimi
Resolution: Unresolved Votes: 0
Labels: cs-subteam3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File cs_timeout_repro.patch    
Assigned Teams:
Cluster Scalability
Operating System: ALL
Steps To Reproduce:

The attached repro is not perfect since it assumes that the stepdown will happen before the timeout is hit, but it has reproduced the problem pretty consistently in my environment.

Participants:

 Description   

The reshardingCriticalSectionTimeoutMillis parameter is intended to bound the amount of time that the critical section will be held during resharding. This is implemented by scheduling a callback which sets an error if the timeout is exceeded.

However, this is a local callback that is scheduled, and it seems as though it is never re-scheduled in the case of stepdown so the timeout parameter will be ignored after a stepdown occurs.


Generated at Thu Feb 08 06:55:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.