Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The reshardingCriticalSectionTimeoutMillis parameter limits how long the critical section is held during resharding by scheduling a callback that triggers an error if the timeout is exceeded. ~~SERVER-84709~~ made changes to ensure that the callback gets re-scheduled in the case of a stepdown. However, the callback only gets re-scheduled if the coordinator is in the blocking-write phase.

Without the timeout callback being re-scheduled in the committing phase, resharding could potentially block writes longer than the set critical section timeout (default 5 seconds), defeating the purpose of the timeout parameter. However, it is not as simple because aborting during the committing phase may not be safe after commit messages have been sent to participants.

One scenario that could happen:

A coordinator transitioned to kCommitting.
Failover occurs before the participants are told to commit.
Upon step, the coordinator resumes the commit.
Any of the work left in the commit protocol takes longer than the time remaining of the critical section. Example: bad replication lag while waiting for majority here.
The contract of the reshardingCriticalSectionTimeoutMillis parameter is broken.

Is it intended behavior? Investigate how to properly handle the timeout during the committing phase after a failover on the coordinator.

is related to

SERVER-84709 Resharding critical section timeout is not honored on stepdown

Closed

Assignee:: Unassigned
Reporter:: Kruti Shah
Participants:: Kruti Shah
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jul 16 2025 09:14:25 PM UTC
Updated:: Jul 17 2025 03:31:47 PM UTC
Resolved:: Jul 17 2025 03:30:34 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates