-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
CAR Team 2024-10-28, CAR Team 2024-11-11, CAR Team 2024-11-25
-
200
-
3
checkMetadataConsistency acquires a MODE_S lock on the database to guarantee some catalog stability during its checks. This strong lock interacts poorly with long running intention locks, such as resharding. For instance:
- Resharding acquires IX lock on DB
- checkMetadataConsistency enqueues MODE_S lock
- Any write that tries to acquire another IX lock will block behind the MODE_S attempt, until resharding+checkMetadataConsistency complete or the MODE_S lock times out (5 minutes by default).
This is a potential problem for production and also for testing, since we run checkMetadataConsistency in the background and some suites also run background collection migrations (moveCollection/resharding).
One idea is to have a try-lock API with backoff such that the MODE_S lock is not enqueued right away. If the operation would starve we can either fail it or eventually enqueue it as we do today.