Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 7.0.0, 8.0.0
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Sprint:
CAR Team 2025-05-12
Confidence Status:
None
Work Order:
3
Size Category:
TBD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The setAlwaysInterruptAtStepDownOrUp_UNSAFE() function on the operation context is unsafe in a sense that if a stepdown happens during the run of that function, the said stepdown (and the interruption) would be missed.

If a component tries to acquire the DDL lock after it became a secondary, it will hang for 5 minutes (the default timeout for the DDL lock acquire) since the DDL lock acquisition waits for the recovery of the ShardingDDLCoordinator, what will never happen because the node became a secondary.
Note that this can only happen if the node became a primary, started to recovery and during that i became a secondary again.

On 8.1+ it is fixed in a way that the ShardingDDLCoordinator implement a Recoverable interface, and that implementation will always be right about the actual state of the node (triggers on the change from Recovering instead of waiting for Recovered). For further information check SERVER-90371

This ticket is to fix on 8.0 and 7.0 in a simpler way with an "optimistic double check lock".
First we try to wait for DDL coordinator recovery with a relatively small timeout (100ms, but configurable), if it fails we double check if we are still the primary by taking the RSTL (through the global lock) and check our role. If we are not the primary anymore we can interrupt the ddl acquisition. If we are still the primary, we can wait for the DDLLock acquisition for a longer time.
Before waiting for the recovery, we have to make sure the context is marked as interruptible on stepdown. After the primary checking we can be sure, we won't miss any stepdown interrupts

Assignee:: Wolfee Farkas
Reporter:: Wolfee Farkas
Participants:: Wolfee Farkas
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Apr 25 2025 09:01:05 AM UTC
Updated:: Jun 02 2025 06:04:09 PM UTC
Confidence Status Last Update:: 25/Apr/25 11:55 AM

Details

Description

Attachments

Activity

People

Dates