[SERVER-16773] Performance degradation due to TCMalloc scalability Created: 08/Jan/15 Updated: 03/Dec/21 Resolved: 26/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Storage |
| Affects Version/s: | 2.8.0-rc4 |
| Fix Version/s: | 3.0.0-rc7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | John Page | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Mongodb using the TCMalloc memory allocator. |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
Performance degradation due to TCMalloc scalability with very large number of threads. After a few minutes of running with hundreds of threads, the large majority of CPU time is spent scavenging memory. As this often occurs in critical sections, system throughput may degrade by an order of magnitude or more. Increasing the TCMalloc thread cache to its maximum of 1 GB does not avoid this problem. Using the system allocator does, but costs performance in most other cases. |
| Comments |
| Comment by deyukong [ 23/Dec/17 ] |
|
Agree with @Igor Canadi Btw, changing tcmalloc's max_thread_cache offers no help. |
| Comment by Igor Canadi [ 06/Oct/15 ] |
|
The same issue was observed by Ceph: https://ceph.com/planet/the-ceph-and-tcmalloc-performance-story/ Switching to jemalloc helped them, too. |
| Comment by Igor Canadi [ 06/Oct/15 ] |
|
We encountered similar issue with MongoRocks. 35% of the CPU was being spent in tcmalloc, usually in functions related to CentralFreeList (meaning that thread local cache couldn't fulfill the request). There were a lot of context switches, which indicates lock contention, likely in CentralFreeList. We were running 3.0.6 version with latest Eliot's commit, meaning that cache size was configured to be 1GB. Switching to jemalloc, CPU spent on malloc/free went down to ~4% and latency improved dramatically. |
| Comment by Githook User [ 28/Jan/15 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: (cherry picked from commit 9da40029fef37df8d33218101ffa2ff22d94a2da) |
| Comment by Githook User [ 28/Jan/15 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: (cherry picked from commit c30688f704e3fbde4ee83aa2f45a6d79900f10c9) |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: This change removes ScopedLock from the RAII lock objects' hierarchy. This I did not see any significant performance improvement, but next change (cherry picked from commit fe3e92d4257b30f01b62d4ef941686b7e0138a8c) |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: This change removes ScopedLock from the RAII lock objects' hierarchy. This I did not see any significant performance improvement, but next change |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: (cherry picked from commit c4ec7db6ce25e7147ae5f46ac0a7f6a52a0b4c3e) |
| Comment by Githook User [ 26/Jan/15 ] |
|
Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: |
| Comment by Githook User [ 22/Jan/15 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: |
| Comment by John Page [ 19/Jan/15 ] |
|
Does rc4 and rc5 show a difference? { name : "John Page", } On Mon, Jan 19, 2015 at 6:13 PM, Geert Bosch (JIRA) <jira@mongodb.org> |
| Comment by John Page [ 19/Jan/15 ] |
|
If you want me to retest I'm happy to do so. 2.8-rc5? |
| Comment by Geert Bosch [ 19/Jan/15 ] |
|
I just redid a run with a build of vanilla rc5 without any patches and when doing a 1500 second run didn't see any performance anomalies, will attach results graph. server build info: |
| Comment by John Page [ 19/Jan/15 ] |
|
I don't have any answers for that one. |
| Comment by Geert Bosch [ 19/Jan/15 ] |
|
OK, thanks. I'll resolve that, however this does not seem to invalidate the results so far. Do you have any idea why the first 450 seconds of a 600 second run would look different from the first 450 seconds of a 1500 second run? |
| Comment by John Page [ 17/Jan/15 ] |
|
What are you seeing is where you don't have enough client side file handles |
| Comment by John Page [ 17/Jan/15 ] |
|
That's the C driver code for not enough available file handles. |
| Comment by Geert Bosch [ 17/Jan/15 ] |
|
There seems a problem with the load generator: the first 500 seconds look much better if I specify a test duration of 1500 seconds. There also an issue with not all threads completing always: if I specify 1000 threads the test might finish with 250 or so. Error message is: "Failed to read 4 bytes from socket. Child Quitl." The test system has plenty of resources to run and is never overloaded. Anyway, I get good throughput now with regular journaling/yielding etc. enabled for a 1500 second (25 min) test run. |
| Comment by David Daly [ 16/Jan/15 ] |
|
I tried John's workload against RC4. Interesting behavior. After about 3-4 minutes, the performance increases. I've labelled that as time B. Test starts at A. Seems stable after B. |
| Comment by David Daly [ 16/Jan/15 ] |
|
Short summary:
|
| Comment by John Page [ 16/Jan/15 ] |
|
I will depend what version of that code you have. |
| Comment by David Daly [ 16/Jan/15 ] |
|
I think the spike in inserts is an artifact of the restart and the workload reconnection. I think the workload fails to get the test specification from the database, and then runs the default workload for a little while, until it suceeds in querying the server for the current test configuration. |
| Comment by David Daly [ 16/Jan/15 ] |
|
Looking at the stats, I see the connections go to zero when I restart it, but then it goes back to where it was after I resume the workload. |
| Comment by David Daly [ 16/Jan/15 ] |
|
In case it's interesting, here's a shot of the spike in inserts after restart
Also, opcounters for query and update seem higher than documents returned and documents updated. |
| Comment by John Page [ 16/Jan/15 ] |
|
I don't think the workload will reconnect if you restart the server though
{ name : "John Page", } |
| Comment by David Daly [ 16/Jan/15 ] |
|
Thanks john.page. Trying a slightly different tack now. Doing a long duration run, and the suspending the workload process to restart the server. |
| Comment by John Page [ 16/Jan/15 ] |
|
Query and update only access records inserted in that run so a query update only workload isn't doing much. |
| Comment by David Daly [ 16/Jan/15 ] |
|
The workload can be adjusted between inserts, queries, and updates by adjusting the entry in the testsrv.test collection on the system under test. There is one record with _id : "loadtest". The number of operations of each type is proportional to the value in each field. Set "insert" : 0 to stop all inserts. Fields can be updated while the test is running. Some initial observations from adjusting the running load (all with 512 threads)
|
| Comment by Geert Bosch [ 15/Jan/15 ] |
|
Interestingly, jemalloc doesn't get as bad but shows same shape. LockLessInc malloc is similar as well. |
| Comment by Geert Bosch [ 15/Jan/15 ] |
|
TCMalloc 2.0 has similar behavior as well. |
| Comment by Geert Bosch [ 15/Jan/15 ] |
|
TCMalloc 2.4 has same behavior as the 2.2 version we are using now. Will be trying 2.0 next. |
| Comment by Daniel Pasette (Inactive) [ 10/Jan/15 ] |
|
John, can you include the git hash of the mongod binaries you're testing as well as the parameters you're passing to mongod? |