[SERVER-75262] Add a passthrough test that exercises ticket exhaustion Created: 24/Mar/23  Updated: 06/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Matt Kneiser Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: storex-ranked
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-76991 Create a "kitchen sink" suite Open
is related to SERVER-75205 Deadlock between stepdown and restori... Closed
is related to SERVER-76834 Circular wait dependency between PBWM... Closed
is related to SERVER-76835 Deadlock in the shard filtering metad... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2023-05-01, Execution Team 2023-05-15, Execution Team 2023-05-29, Execution Team 2023-06-12, Execution EMEA Team 2023-06-26, Execution EMEA Team 2023-07-10, Execution EMEA Team 2023-07-24, Execution EMEA Team 2023-08-07, Execution EMEA Team 2023-08-21
Participants:
Linked BF Score: 120

 Description   

One of the key features of the linked issue's deadlock is read ticket exhaustion which is broadly not covered well by existing testing.

 

This ticket tracks adding coverage to help catch this class of deadlocks by limiting concurrency and forcing query to yield more often.



 Comments   
Comment by Josef Ahmad [ 08/Aug/23 ]

I am putting this back into Open state, as it's lagged behind other priorities. The PR introduces new config-fuzzed tasks, which in turn generate some failures, and I've been trying to disentangle the expected from the unexpected.

Comment by Judah Schvimer [ 05/May/23 ]

We should ensure we include fuzzers, core, and concurrency to get complete coverage.

Comment by Louis Williams [ 24/Mar/23 ]

I ran a patch build on 4.4 with the default tickets at the minimum and did not see any relevant failures. We also already lower the default number of tickets in the config_fuzzer.

So I still think there's still an element of test coverage that we're missing beyond lowering tickets. Increasing the yield period in the config fuzzer would be a good start. But the minimum number of tickets, 5 might be too low.

I would also prefer to use the config fuzzer rather than build a bespoke passthrough suite that adjust some server parameters. We could also expand coverage by adding creating a config_fuzzer suite that performs stepdowns, since we don't have one already.

Generated at Thu Feb 08 06:29:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.