[SERVER-43214] Consider decreasing distlock retry timeout Created: 06/Sep/19  Updated: 06/Dec/22  Resolved: 16/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 3.6.14, 4.0.12, 4.2.0
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding
Participants:
Linked BF Score: 34

 Description   

If we have a thread that tries to newly acquire the distlock on an aggressively quick cadence (less than a second), it's possible that threads that have been waiting on the distlock can get crowded out by the new request. We should decrease the retry timeout, so that retrying threads can have a more equitable chance to acquire the distlock.

This ticket may alternatively act as a placeholder for more preferable solutions to the linked BF.



 Comments   
Comment by Mira Carey [ 16/Sep/19 ]

We'll make an effort later on to allow the system to work in the presence of a very active balancer.

Without doing work to allow writes to proceed, this work however isn't immediately valuable

Comment by Blake Oler [ 13/Sep/19 ]

Does the ticket fall into the category of "Sharding Catalog Inconsistency" issues?

No

What is the user-visible or BF/Evergreen visible effect of this ticket?

Is it data corruption or data loss?

No

How likely is it to happen (in your opinion, if you don't have insight)?

Very likely if we recommit the random balancer policy

Does it cause Build Baron nuisance and noise in the sharding tests?

It will if we recommit the random balancer policy

Looks like could be a Support nuisance, such as having to call support for something stupid such as clearing some jumbo chunk, etc?

No

How difficult/risky is it to fix?

4/4 – might be an easy fix to implement, but could wide-reaching implications on the system as a whole.

Generated at Thu Feb 08 05:02:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.