[SERVER-56600] Make DDL coordinators not wait for DistLock in the coordinator's thread pool Created: 04/May/21  Updated: 06/Dec/22  Resolved: 13/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: Backlog
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Tommaso Tocci Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Won't Do Votes: 0
Labels: sharding-wfbf-sprint
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-56598 Sharding DDL coordinators block runni... Closed
Related
is related to SERVER-56599 Make ShardingDDLCoordinatorService th... Closed
Assigned Teams:
Sharding EMEA
Participants:

 Description   

Currently sharding DDL coordinators attempt to acquire dist-locks in their construction phase. This is blocking work that depends on other coordinators to complete. Since all the coordinators use the same shared threadpool we can end-up with the following deadlock on thread acquisition:

  • coordinator 1 waits on the dist-locks acquisition without yelding his threads (T1)
  • coordinator 2 waits on (T1) to be available in other to release the distlock.

In SERVER-56599 the threadpool size used by the ShardingDDLCoordinatorService has been made of unlimited size due to several bugs.

The goal of this ticket is to make it capped again once all the related bugs have been solved.



 Comments   
Comment by Tommaso Tocci [ 13/Aug/21 ]

We won't be able to do this until we will have a non-blocking API for distLock or local lock acquisition.

Generated at Thu Feb 08 05:39:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.