[SERVER-30114] Monitor cumulative time spent in tcmalloc spin lock in serverStatus Created: 12/Jul/17 Updated: 30/Oct/23 Resolved: 09/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.20, 3.4.14, 3.6.4, 3.7.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Mark Benvenuto |
| Resolution: | Fixed | Votes: | 25 |
| Labels: | SWDI | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v3.6, v3.4, v3.2
|
||||||||||||
| Sprint: | Platforms 2018-01-29, Platforms 2018-02-12 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
We sometimes encounter allocator bottlenecks that currently require asking users to collect stack traces with gdb to diagnose. We could diagnose these bottlenecks from FTDC data if we counted cumulative time spent waiting for the primary lock involved. Typical stacks for such a bottleneck look like this:
|
| Comments |
| Comment by Githook User [ 26/Feb/18 ] | |
|
Author: {'email': 'mark.benvenuto@mongodb.com', 'name': 'Mark Benvenuto', 'username': 'markbenvenuto'}Message: (cherry picked from commit ba957823288d674f34c8345059f35f34cb4dd055) | |
| Comment by Githook User [ 20/Feb/18 ] | |
|
Author: {'email': 'mark.benvenuto@mongodb.com', 'name': 'Mark Benvenuto', 'username': 'markbenvenuto'}Message: (cherry picked from commit ba957823288d674f34c8345059f35f34cb4dd055) | |
| Comment by Githook User [ 20/Feb/18 ] | |
|
Author: {'email': 'mark.benvenuto@mongodb.com', 'name': 'Mark Benvenuto', 'username': 'markbenvenuto'}Message: (cherry picked from commit ba957823288d674f34c8345059f35f34cb4dd055) | |
| Comment by Githook User [ 09/Feb/18 ] | |
|
Author: {'email': 'mark.benvenuto@mongodb.com', 'name': 'Mark Benvenuto', 'username': 'markbenvenuto'}Message: | |
| Comment by Bruce Lucas (Inactive) [ 24/Jan/18 ] | |
|
Once we introduce the metric we can see whether the workloads are generating significant tcmalloc spinlock contention, and if they are then we know they are useful tests for gauging any impact of introducing the metric. | |
| Comment by Andrew Morrow (Inactive) [ 24/Jan/18 ] | |
|
I meant it the other way around - I want to use those workloads to ensure that introducing the metric has acceptable (hopefully negligible) performance overhead before we propose it upstream and/or merge it ourselves. | |
| Comment by Bruce Lucas (Inactive) [ 24/Jan/18 ] | |
|
Once we've introduce the metric we can check those workloads to see if they're generating a signal. | |
| Comment by David Daly [ 24/Jan/18 ] | |
|
We've seen interesting tcmalloc related behavior with the mixed_* workloads in core_workloads in the past. It has a large number of threads, with groups of threads doing different simple operations (find, update, delete, ...) | |
| Comment by Bruce Lucas (Inactive) [ 23/Jan/18 ] | |
|
Simple performance test. Java load generator (attached) run as follows:
This is 200 threads all inserting tiny (_id-only) documents in batches of 1000. Standalone, no journal, no checkpoints, running on 12 cores. Alternating 30-second runs with and without spinlock monitor, restarting mongod each run:
As the spinlock monitor stats show we're hitting the spinlock moderately hard because we have 200 threads on 12 cores filling the cache with tiny documents; on average at all times we have 5-10 threads waiting for the spinlock. No visible difference. Numerically, total number of inserts in each 30 second run: with spinlock monitor: 21157000 20973000 21047000 21267000 21373000 20563000 21208000 20299000 20714000 21007000 without spinlock monitor: 20899000 21677000 20777000 21190000 20996000 21061000 21197000 20857000 21705000 21330000 With spinlock monitor average is about 1% less than without, which is about 0.6 standard deviations, so in the noise. | |
| Comment by Bruce Lucas (Inactive) [ 22/Dec/17 ] | |
|
I've attached a new POC (spinlock2.diff) that measures the actual time spent in SpinLockDelay rather than just accumulating the supplied sleep times because the Linux design uses futex with wakeups so it may wake up before the supplied sleep time elapses. This requires getting the current time, but since this is only done if we also do an expensive delay I think it's unlikely to have a performance impact. | |
| Comment by Bruce Lucas (Inactive) [ 21/Dec/17 ] | |
|
I think that we can get 90% of the diagnostic value with something much simpler
I've attached a POC implementation; it's only a few lines of code. Here's an example in use:
Left is the current default executor; right is the adaptive executor. We see some allocator contention in the first run, although it maxes out at about 37 threads out of 145 so it's probably not impacting performance much. Would it be feasible to put this monitoring in? | |
| Comment by Mark Benvenuto [ 18/Dec/17 ] | |
|
There are a few reasons why I do not want to add stats tracking on a permanent basis to MongoDB:
| |
| Comment by Bruce Lucas (Inactive) [ 20/Nov/17 ] | |
|
mark.benvenuto, acm, my apologies, but I think I neglected to bring this ticket up when we were reviewing tcmalloc work for 3.6. My concern is that we made a fairly major change in the way we use tcmalloc (disabling markThreadIdle), with an option to re-enable it if a particular customer workload proves to need it, but we don't have a good way to monitor whether tcmalloc has become a bottleneck. In my experience with customer issues typically tcmalloc bottlenecks here but leaves no trace in the metrics, making it difficult to diagnose. This metric would fix that. I realize that it is probably too tight to get this into 3.6.0, particularly since we need to make sure this monitoring doesn't introduce performance problems itself, but would it be possible to do this early in 3.7 and backport it to 3.6.small? | |
| Comment by Mark Benvenuto [ 12/Jul/17 ] | |
|
Prototype code that tracks tcmalloc SpinLock spinning by spinlock type: https://github.com/markbenvenuto/mongo/commit/eda17993ec4638d38dc2c8d0e105082219c124e5 |