[SERVER-69028] Collect thread migrations in FTDC Created: 21/Aug/22 Updated: 02/Feb/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Service Arch
|
| Participants: |
| Description |
BackgroundThread migrations happen when Kernel load balancer moves a scheduled process/thread from one runqueue to another when inefficiency of unbalanced "load" is higher than migration cost. High frequency of thread migrations indicate that the thread model is suboptimal, one or more cores are left idle while others have a runqueue of jobs ready to run. This may happen in at least two cases: high lock contention and/or improper use of thread pools (too many thread pools, thread pools too large or too small, etc). MotivationWe know our thread model is suboptimal. We have thread per request model with too many auxiliary thread pools. The approximate roadmap to fix that is: For all of those tasks we need proper measurements. Benchmarking the code during development is time consuming and not necessary. Using thread migrations as a quick negative signal is easy and more productive. Low frequency of thread migrations is not sufficient to indicate that the thread model is good, but high frequency is always bad. When this signal is good, other can be used (profiling, lock contention measurements, etc) Approximate designThe current core could be detected by the code:
This claims that __rdtscp is not a serializing instruction, it only "waits until all previous instructions have executed". This also confirms this statement. Thus we probably should not worry about performance implication, CC amirsaman.memaripour@mongodb.com to confirm. It is very convenient to use thread local to track the last current core. Indeed, thread local variable will migrate together with thread to new core. The implementation should query the current core sufficiently often and increment the thread local counter to accumulate the observed migrations. In production, we may observe involuntary context switches to the tune 200k QPS, which hints it will be sufficient to query the current core inside a new listener on `_onContendedLock()` and then on `_onUnlock()`. Perhaps it will be cheaper to add a callback `_onContendedUnlock()` because the migration is unlikely to happen if the current thread was not put to sleep. Remember, the thread migration happens only when the thread is on runqueue. CollectionCollection requirements are: We should accumulate the current migration count in thread local and flush when opCtx is created and destroyed. The longer shot task will be to use this count bucketed on command and on users. This will give us insight on which commands and which users are associated with the most thread migrations. This may also be used for better user isolation in future, the user creating the most of thread migrations should be the first to throttle. Flushing this counter to opCtx is easier so far we use the thread per connection model, it will break later. For asynchronous model later, we will need to flush on ThreadClient destruction, and then when the thread is recycled in the thread pool. This is conditional to observing this as a useful signal in production. Roadmap1. Implement a simple solution (this ticket) |
| Comments |
| Comment by Bruce Lucas (Inactive) [ 24/Aug/22 ] |
|
Very good, thanks for the confirmation. |
| Comment by Andrew Shuvalov (Inactive) [ 23/Aug/22 ] |
|
bruce.lucas@mongodb.com yes, not in this ticket. The idea of per-user counters is not to expose it in FTDC but to use it in our future user isolation implementation. As we don't have any there is no point to add it now. We need to assemble a list of ~3 different metrics to pinpoint abusive users to use it in user isolation. When we decide to do it, thread migrations will be one of those 3. Per-op metric is strictly for manual investigations, can be exposed with additional verbose filed in `serverStatus`. We should never have this kind of granularity in default FTDC. So yes, just 1 new metric. |
| Comment by Bruce Lucas (Inactive) [ 23/Aug/22 ] |
|
Adding a single counter to FTDC sounds reasonable. I would be concerned about adding anything per-command or per-user to FTDC because of the volume of counters it could create. |