[SERVER-72721] Race in TicketHolderTests Created: 11/Jan/23  Updated: 27/Oct/23  Resolved: 17/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Backlog - Storage Execution Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Storage Execution
Operating System: ALL
Steps To Reproduce:
  • Comment out all tests in ticketholder_test.cpp except for PriorityTwoQueuedOperations.
  • Apply this diff

    diff --git a/src/mongo/util/concurrency/priority_ticketholder.cpp b/src/mongo/util/concurrency/priority_ticketholder.cpp
    index ab234ee3c8f..f48d68d3745 100644
    --- a/src/mongo/util/concurrency/priority_ticketholder.cpp
    +++ b/src/mongo/util/concurrency/priority_ticketholder.cpp
    @@ -85,6 +85,10 @@ boost::optional<Ticket> PriorityTicketHolder::_waitForTicketUntilImpl(OperationC
         _enqueuedElements.addAndFetch(1);
         ON_BLOCK_EXIT([&] { _enqueuedElements.subtractAndFetch(1); });
     
    +    if (admCtx->getPriority() == AdmissionContext::Priority::kNormal) {
    +        sleep(100);
    +    }
    +
         ticket_queues::UniqueLockGuard uniqueQueueLock(_queueMutex);
         do {
             while (_ticketsAvailable.load() <= 0 ||
    

  • Run ninja -j400 +ticketholder_test
Participants:
Linked BF Score: 60

 Description   

Multiple tests in the TicketHolder test suite can fail when adding a sleep between the time _enqueuedElements and the _queueMutex is acquired (see linked BFs). It is not safe to rely on _enqueuedElements to determine when an operation has successfully been enqueued. The way the TicketHolder is tested should be changed to account for this.

Original Description:
This test (PriorityTwoQueuedOperations) checks whether both the low and normal priority ticket operations have been enqueued by checking a metric that's incremented before the tickets are actually enqueued (see here). This can result in a hang here as the low priority operation can get the ticket before the normal priority operation.



 Comments   
Comment by Haley Connelly [ 17/Jan/23 ]

With the change of the TicketHolder implementation in SERVER-72072, queued() relies on the brokers to update the number of operations queued while under a unique lock. This means, If a ticket is released, the releaser must acquire the lock to know whether it can be transferred to a broker. Thus, due to mutual exclusion, the queued() count must be up to date when a releaser comes in.

I believe this race, the race between updating the queued() count before actually registering an operation as waiting, has gone away with changes in SERVER-72072

Generated at Thu Feb 08 06:22:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.