[SERVER-75074] Reduce the number of retries when reopening buckets Created: 20/Mar/23  Updated: 14/Apr/23  Resolved: 14/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Fausto Leyva (Inactive) Assignee: Fausto Leyva (Inactive)
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-75094 Fix indefinite WriteConflict return w... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2023-05-01
Participants:

 Description   

In the TS scalability project, we introduced bucket reopening. If we fetch a bucket that might be stale, we re-fetch the bucket and do so without bounds (especially dangerous if we continuously receive WriteConflict errors) see: [write_ops_exec.cpp:insertIntoBucketCatalog].

 

My suggestion is to cap the number of retries to something like 3. It's a bit arbitrary but I think it could still be beneficial to retry a few times before inserting into a new bucket. Otherwise, we should consider not returning WriteConflicts during the reopening process and insert into a new bucket upon a single reopening failure.

 

For context, this was surfaced through SERVER-73094 where we test concurrent multi-deletes and inserts into the same/overlapping buckets. 


Generated at Thu Feb 08 06:29:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.