-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Serverless
-
ALL
SERVER-60752 introduced a new API for detecting conflicts among POS instances. As part of that ticket, to detect different migration id with same tenant id conflicts, the tenant migration donor(TMD) API was made to wait for the existing TMD instance's initial state doc to be majority committed. This can lead to 3-way deadlocks. I think, accidentally, SERVER-60953, fixed that 3 way deadlock issue but made the API (PrimaryOnlyService::checkIfConflictsWithOtherInstances) racy.
Racy scenario:
TMD Instance 1 Migration ID 1 + Tenant ID 1 |
TMD Instance 2 Migration ID 2 + Tenant ID 1 |
---|---|
Calls getOrCreateInstance()
|
|
Calls getOrCreateInstance()
|
|
Instance1::run() starts
|
|
Instance2::run() starts
|
Additional notes on 3-way deadlock scenario:
1) Instance 2 holds POS mutex (as part of getOrCreateInstance()) and wait for Instance1 initial state doc to be majority committed.
2) Stepdown thread holds RSTL in mode X and tries to acquire POS mutex to execute POS onStepDown() (to interrupt active instances) and blocks behind Instance 2.
3) Instance 1 tries to acquire RSTL in IX mode to write the initial state doc but blocks behind the stepdown thread.
- is related to
-
SERVER-60752 API for detecting conflicts among PrimaryOnlyService instances
- Closed
-
SERVER-60953 TenantMigrationDonorService::getDurableState should return a future
- Closed