[SERVER-56125] Scoped wrapper for out-of-line executors Created: 15/Apr/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

SERVER-40722 introduced a wrapper for task executors to ensure all outstanding callbacks are executed before the wrapper is destroyed (defined here). This wrapper simplifies draining all tasks scheduled by a subsystem (e.g., PrimaryOnlyService) before a particular execution point (e.g., the destructor of the subsystem).

This ticket should provide a similar wrapper for instances of out-of-line executor (e.g., ThreadPool). We can also deprecate waitForIdle in favor of using scoped executors.



 Comments   
Comment by Bruce Lucas (Inactive) [ 16/Apr/21 ]

Thanks amirsaman.memaripour, that makes sense.

Comment by Amirsaman Memaripour [ 15/Apr/21 ]

bruce.lucas, I believe the answer to your question depends on the implementation. If we use the same synchronization primitive (i.e., a condition variable) to wait for completion of all scheduled tasks before returning from the ScopedExecutor destructor, we may experience similar issues with buggy C libraries, but the stack-trace will be very similar to what we currently have (so no impact on debuggability). In my prototype, I'm passing a predicate to the condition variable that should make the condition variable less susceptible to missing notifications.

 

Comment by Bruce Lucas (Inactive) [ 15/Apr/21 ]

We can also deprecate waitForIdle in favor of using scoped executors.

I have a question about how this impacts debuggability in the field, specifically wrt an issue like SERVER-47554/HELP-18861. In that case we were able to reason about the state of the system simply by collecting stack traces and observing that the producer thread was in waitForIdle called from multiApply, but no worker threads were executing, so a notification must have been missed (due to, it turned out, a bug in the C library). With this new approach would that bug (or a comparable bug in synchronization primitives) be manifested in a way that we could also reason about just from stack traces, i.e. would we see a thread waiting for a notification that wasn't going to arrive?

Generated at Thu Feb 08 05:38:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.