-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Networking
-
Product Performance
-
120
-
None
-
None
-
None
-
None
-
None
-
None
-
None
DESCRIPTION (public — paste into Jira Description field)
Problem
Every request on SessionWorkflow invokes yieldPointReached() twice — once pre-receive and once post-response — which calls ServiceExecutor::yieldIfAppropriate() and, when getRunningThreads() > cores (always true on a busy server), issues stdx::this_thread::yield(). On an 8-core ARM64 host running at ~80K requests/sec with 128 connection threads, this produces ~161K sched_yield syscalls/sec plus 2× that many voluntary context switches. Off-CPU profiling shows 57.85% of connection-thread wait time is spent in _sched_yield, and on-CPU profiling attributes 6.92% of total CPU to the same symbol. The userspace yield is redundant under modern admission control, which already caps concurrent DB work via execution_admission_tickets — threads past the cap are queued, not spinning — and the Linux CFS scheduler preempts at the sched tick (≤1 ms) for any residual on-core contention.
Solution
Delete the two _yieldPointReached() call sites in session_workflow.cpp (pre-receive at line 532, post-response at line 915) and the now-unused helper at line 574. ServiceExecutor::yieldIfAppropriate() and getRunningThreads() are left in place for any future callers; only the per-request invocations on the SessionWorkflow hot path are removed. The change retires the BF-27452 mitigation, which was written before admission control was as well-integrated on the ingress path: admission now handles concurrency capping structurally, and kernel preemption handles residual CPU fairness — a userspace sched_yield per request is pure overhead on top of both.