[SERVER-75430] Implement "wait for atomic or interrupt" on OperationContext Created: 29/Mar/23  Updated: 19/Apr/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: perf-servicearch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

We're using futexes to wait on an atomic in the TicketPool, which is a component of the Execution Control rate-limiter and scheduler. This is how we queue waiters and wake them up when a ticket is available.

The problem is that this wait is not interruptible by the MongoDB interruption mechanism, which just sets an atomic flag. To work around that problem, we wait for 500ms and then re-queue.

If we enter a state where most operations queue for more than 500ms, we'll likely enter an undesirable metastable failure state where every operation is queueing, timing out, and re-queueing, which does extra context switching and wastes CPU. 500ms is a lot, but when there are thousands of client threads, this could be problematic.

It would be nice to have a version of OperationContext::waitForAtomicOrInterrupt. The futex syscall supports waiting on multiple atomics at once, so I don't believe this would be complicated to support. Our error codes are all less than the uint32_t max value. One challenge is that just using a futex would circumvent the existing behavior of waitForConditionOrInterrupt, which allows waiters to actively participate in network IO. That said, the ticketholder today is already not participating in this system, so this would not be a change of behavior.



 Comments   
Comment by Louis Williams [ 05/Apr/23 ]

Some other points brought up in triage on why we wouldn't want to do this:

  • Periodically timing out (like we do in our existing code) can be useful because it allows us to periodically update queueing time metrics, rather than on completion. Although we aren't doing that.
  • Waiting on two atomics at once is a relatively recent Linux kernel change (like 5.x)
Comment by Connie Chen [ 04/Apr/23 ]

Putting this in triage for Service Arch to consider, StorEx doesn't think this is particularly urgent

Generated at Thu Feb 08 06:30:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.