-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
ClusterScalability 2Feb-16Feb
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The deadlock is caused by how the registrationTime for a task enqueued into _rangeDeletionTasks can actually be different from the registrationTime captured by the lambda for that same task when the task doesn't have a time set (as both will just use the current time when non are provided). This then breaks the symmetry of this comparison, making it possible for two tasks to wait on each other.
To make this more concrete consider the following example:
- Register task A [10, 30] without an explicit timestamp.
- The entry pushed onto _rangeDeletionTasks gets a registrationTime of now() which is say 100.
- Later when scheduleRangeDeletionChain runs, it again runs now() to compute task A's registrationTime to be captured for the lambda. Let's say the captured value is 101.
- Then we register task B [5, 15] also without an explicit timestamp.
- The entry pushed onto _rangeDeletionTasks gets a default registrationTime of now() which is say 101.
- Later when scheduleRangeDeletionChain runs, it again runs now() to compute task A's registrationTime to be captured for the lambda . Let's say the captured value is 102.
- When we decide ordering/dependencies, we compare the captured registrationTime against the queued one:
-
- Task B ends waits on task A because its lambda captured registration time of 102 is > than the registrationTime on the entry for task A of 101.
- Task A waits on task B because its lambda captured registration time of 101 = the registrationTime on the entry for task B of 101 and its taskID can be smaller than taskB
- (see comparison logic here)