-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
Service Arch 2022-12-12
ServiceExecutorReserved is designed to allow dealing with spawn failures.
Spawn failures are a temporary error. The OS can fail to spawn for some period of time and regain the spawn ability later on.
In the pre- SERVER-70151 ServiceExecutorReserved, schedule() calls would place tasks on a singleton queue, and try to spawn a thread but did not depend on that spawn succeeding.
If there is a period of time in which spawns fail, reserved service executor would still be able to hand incoming scheduled tasks to its established pool of reserved workers. When an idle worker starts its loop iteration by receiving a task, it spawns a new worker to replace itself if worker count is below quota. When it completes a chain of tasks (now called a lease), it decides whether to die or merely go idle, again considering the reserve quota. So workers reproduce only when embarking on a task chain.
The problem:
If spawns fail and reserve is exhausted, tasks will be queued. Suppose the OS recovers and spawns are then possible again. The reserve SvcExec would only find out about it when a reserve thread finishes its task chain and goes idle and spawns.
Review of SERVER-70151 discovered this problem but fixing it was out of scope.
Some kind of spawn retry loop initiated when spawn failures occur would probably mitigate the issue.
- duplicates
-
SERVER-70151 ServiceExecutorSynchronous thread_local-related leaks (revert)
- Closed