Performance degradation due to TCMalloc scalability with very large number of threads. After a few minutes of running with hundreds of threads, the large majority of CPU time is spent scavenging memory. As this often occurs in critical sections, system throughput may degrade by an order of magnitude or more. Increasing the TCMalloc thread cache to its maximum of 1 GB does not avoid this problem. Using the system allocator does, but costs performance in most other cases.