[SERVER-74243] Evaluate the implications of pinning threads Created: 21/Feb/23 Updated: 24/Mar/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Amirsaman Memaripour | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | perf-servicearch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Participants: | |||||||||
| Description |
|
The goal is to pick a set of important workloads and evaluate the implications of setting thread affinity for various MongoDB deployments (e.g., 3-shard cluster). Based on the outcome of this investigation, we may decide to introduce a new feature to allow pinning threads to cores (see |
| Comments |
| Comment by Billy Donahue [ 24/Mar/23 ] |
|
I'm pretty skeptical of that affinity PR and its 2-5% throughput stats. It means that every connection is first executed on some CPU, and can only be scheduled on that CPU forever. So a connection will have a permanent set of "bunkmates" that are also on that single CPU. If you get stuck sharing a CPU with another compute-hogging connection, you will be blocked behind it more often than you would be if Linux was free to schedule you anywhere, adaptively. I imagine this will also lead to suboptimal CPU utilization. Pending work is doomed to only use the CPU it started on, regardless of the relative availability of other CPUs in the system. The scheduler should be making reasonable decisions. If there's something we can do to hint its heuristics, we should. But it feels like the sched_setaffinity is a blunt instrument putting handcuffs on the scheduler. I don't believe the feature is intended for this kind of binding of general work queues to CPUs. It's more of a realtime / compute-intensive thing. You have to get all other processes off of those CPUs and dedicate them to a carefully controlled set of coordinating tasks. That's not what we are doing there. |
| Comment by Jason Chan [ 28/Feb/23 ] |
|
We aren't prioritizing this currently because pinning threads is not expected to be a performance optimization for all general workloads, and are very customer workload-dependent. We would prefer to investigate opportunities to optimize our code to avoid thread migrations in most general cases. |
| Comment by Amirsaman Memaripour [ 22/Feb/23 ] |
|
We should also consider the implications of hyper-threading and vCPUs in this evaluation. |