[SERVER-43214] Consider decreasing distlock retry timeout Created: 06/Sep/19 Updated: 06/Dec/22 Resolved: 16/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | 3.6.14, 4.0.12, 4.2.0 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding
|
||||
| Participants: | |||||
| Linked BF Score: | 34 | ||||
| Description |
|
If we have a thread that tries to newly acquire the distlock on an aggressively quick cadence (less than a second), it's possible that threads that have been waiting on the distlock can get crowded out by the new request. We should decrease the retry timeout, so that retrying threads can have a more equitable chance to acquire the distlock. This ticket may alternatively act as a placeholder for more preferable solutions to the linked BF. |
| Comments |
| Comment by Mira Carey [ 16/Sep/19 ] |
|
We'll make an effort later on to allow the system to work in the presence of a very active balancer. Without doing work to allow writes to proceed, this work however isn't immediately valuable |
| Comment by Blake Oler [ 13/Sep/19 ] |
Does the ticket fall into the category of "Sharding Catalog Inconsistency" issues?No What is the user-visible or BF/Evergreen visible effect of this ticket?Is it data corruption or data loss?No How likely is it to happen (in your opinion, if you don't have insight)?Very likely if we recommit the random balancer policy Does it cause Build Baron nuisance and noise in the sharding tests?It will if we recommit the random balancer policy Looks like could be a Support nuisance, such as having to call support for something stupid such as clearing some jumbo chunk, etc?No How difficult/risky is it to fix?4/4 – might be an easy fix to implement, but could wide-reaching implications on the system as a whole. |