[SERVER-31417] Improve tcmalloc when decommitting large amounts of memory Created: 05/Oct/17 Updated: 30/Jan/24 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Performance Team |
| Resolution: | Unresolved | Votes: | 43 |
| Labels: | RF36, former-quick-wins, perf-effort-xlarge, perf-improve-product, perf-urgency-asap, perf-value-essential | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Product Performance
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Dev Tools 2019-05-06, Dev Tools 2019-04-22 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Description |
|
tcmalloc may occasionally release large amounts of pageheap free memory to the kernel by calling madvise. This can take seconds when the amount of memory involved is many GB. A tcmalloc internal lock is held while this happens, so this can potentially stall many threads, causing widespread latency spikes. There is no direct metric that diagnoses this (
|
| Comments |
| Comment by Ian Springer [ 27/Oct/23 ] |
|
We are also hitting this issue in production and interested if there has been any progress. We tried increasing the release rate to 5 and 10, and found that it prevented the stalls but also significantly impacted query latency. Setting it to 2 appeared to be a good compromise, but we haven't had a chance to test it extensively yet. |
| Comment by Jose Ledesma [ 21/Nov/22 ] |
|
We are hitting this issue (server stalls while freeing up a big amount of page_heap_free_bytes). Any progress on this issue? Do we know if increasing the TCMALLOC_RELEASE_RATE may help to mitigate this issue? |
| Comment by Henrik Ingo (Inactive) [ 09/Aug/19 ] |
|
We recently learned that TCMALLOC_RELEASE_RATE can be used to make tcmalloc release memory more frequently to the OS. We haven't tested anything that would resemble a repro of this issue, but I could speculate that using TCMALLOC_RELEASE_RATE=10 could cause tcmalloc to call madvise more frequently and in smaller chunks.
|
| Comment by Bruce Lucas (Inactive) [ 24/Oct/18 ] |
|
An related effect is that mongod's reluctance to release memory to the o/s in a timely can cause two additional problems:
I mention these issues because I think a fix for this ticket is likely to help with both of those issues. |
| Comment by Ian Whalen (Inactive) [ 05/Aug/18 ] |
|
ping mark.benvenuto |
| Comment by Bruce Lucas (Inactive) [ 21/Mar/18 ] |
|
This has been seen by another customer on 3.6.1 in |
| Comment by Bruce Lucas (Inactive) [ 19/Oct/17 ] |
|
I wonder if we should surface TCMALLOC_AGGRESSIVE_DECOMMIT as a server parameter, possibly changeable at runtime, so users don't have to set an environment variable and restart to test the impact of enabling it. mark.benvenuto, acm? |