[SERVER-54747] Vendor tcmalloc into MongoDB's third party sources Created: 24/Feb/21 Updated: 01/Mar/23 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Tammy Bailey (Inactive) | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Engines
|
||||||||
| Sprint: | Storage - Ra 2021-03-08, Storage - Ra 2021-03-22 | ||||||||
| Participants: | |||||||||
| Story Points: | 5 | ||||||||
| Description |
|
This ticket is to add the new TCMalloc to the MongoDB repository and integrate with SCons. The source code will be located in the directory src/third_party/tcmalloc. The TCMalloc source contains BUILD.bazel files for use with the Bazel build system; these Bazel files should be removed and the appropriate SCons files added. |
| Comments |
| Comment by Andrew Shuvalov (Inactive) [ 25/Jul/21 ] | |||||||||||||||||||||
|
Added a comment on performance here. | |||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 19/Mar/21 ] | |||||||||||||||||||||
|
tammy.bailey I agree with bruce.lucas that it is fine to leave it in while the evaluation proceeds. | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 19/Mar/21 ] | |||||||||||||||||||||
|
tammy.bailey, I don't think there's any need to do that. I think having that alternative implementation to, as you say, explore its potential, has been helpful and has generated useful discussion as input to the design decision when/if we're ready to make that decision. | |||||||||||||||||||||
| Comment by Tammy Bailey (Inactive) [ 19/Mar/21 ] | |||||||||||||||||||||
|
acm I think there's been a rather large misunderstanding about the heap profiler implementation provided in this build. We had some extra bandwidth, so provided an implementation of the heap profiler using the tcmalloc allocation sampling API both in the interest of producing a complete build and to explore the potential of the cleaner implementation, where the actual library code wouldn't need to be altered. This change was outside the scope of this ticket and was definitely not a design decision for the heap profiler by any means. The new tcmalloc library is still pending performance evaluation, so it's uncertain whether we'll even get to a point where a new heap profiler implementation will be needed. I'm wondering if perhaps we should back the heap profiler changes out of this build to avoid future confusion? | |||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 18/Mar/21 ] | |||||||||||||||||||||
|
bruce.lucas - The google/tcmalloc maintainers have made pretty clear that they don't intend to maintain ABI: https://github.com/google/tcmalloc/issues/2. Now ABI and API are admittedly different - the former is a much stricter bar. But an explicit statement not to keep ABI stable certainly suggests a willingness to iterate API and implementation/behavior. I'll note also that unlike the gperftools/tcmalloc, the google/tcmalloc project hasn't made any tags, created any branches, or issued any releases. It leads me to the conclusion that they intend to treat this project much more as something that lives at HEAD to reflect the state of the software within google. At the very least it probably means the end of offering or supporting --use-system-tcmalloc for the new tcmalloc. | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 18/Mar/21 ] | |||||||||||||||||||||
|
For future reference, I'll record another question I have about the built-in heap profiling API for the new tcmalloc: do we have an understanding of the long-term stability of the interface and function from an upstream perspective? | |||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 18/Mar/21 ] | |||||||||||||||||||||
|
Just a quick note, but IIUC the plan here is to replace the existing heap profiler (mongodb written one in src/mongo/util) with the heap profiler from the new tcmalloc. There are a few issues there. The first is that since we know we can't use the new tcmalloc on all platforms, we can't actually remove mongo the heap profiler: we will still need it on those platforms where the new tcmalloc can't (yet?) be used. Another concern is that the new tcmalloc profiler is based on using the abseil backtracing facilities. One of the advantages of the current mongodb heap profiler is that it re-uses the mongodb side backtracing facilities that we developed over the past two years. I'm somewhat reluctant to see us reintroduce a new backtracing facility, especially one that isn't under our control. | |||||||||||||||||||||
| Comment by Tammy Bailey (Inactive) [ 09/Mar/21 ] | |||||||||||||||||||||
|
bruce.lucas do you mean an evergreen patch build (e.g. to run the testing) or something else? | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 09/Mar/21 ] | |||||||||||||||||||||
|
alison.felizzi, tammy.bailey, I discussed this with the perf team to coordinate. For both the general performance testing and for taking look at the heap profiler ideally we would want an Evergreen build to work from. Can you make that available when you feel you have something that is ready to look at? | |||||||||||||||||||||
| Comment by Alison Felizzi (Inactive) [ 09/Mar/21 ] | |||||||||||||||||||||
|
Quick status update: I also spent a little bit of time validating the aarch64 and PPC builds. Appending to the minimum compiler version discoveries in an earlier post:
For aarch64, it's Clang 11+ as minimum. This is due to older versions of aarch64 Clang not having support for 'asm goto' statements with output constraints (as in Clang 9 and Clang 10). Unfortunately I couldn't get older versions of Clang to compile for aarch64 without manually hacking on the ASM statements (which I don't think is an ideal change to maintain). It is important to note that aarch64 is currently maintained on a best effort basis by the maintainers (as opposed to first tier support). Just in case if this influences the decision to maintain an aarch64 Linux build. PPC is failing similarly on earlier compiler versions, as was discovered when doing the initial builds on x86 in the earlier post. Will need to further validate with access to a PPC machine since I've only been able to test with patch builds (having limited flexibility on swapping out and experimenting with different compiler versions). | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||
|
tammy.bailey, understood. I'll work with the performance team to see if we can take a look to get an early start on that discussion in the context of their work. No need at this point for any additional builds beyond that from you. | |||||||||||||||||||||
| Comment by Tammy Bailey (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||
|
bruce.lucas The first goal of this project - and the intent of this ticket - is to provide a build to the performance team so they can evaluate how the new library performs against workloads that reproduce excessive memory fragmentation. Supporting the heap profiler in a way that meets the needs and requirements of the server team is indeed a future goal of the project, but our understanding (and please correct me if I'm mistaken) was that it was not required for the perf team to perform their evaluation. Thus we performed the most straightforward implementation possible to meet the first goal. We are fully aware that there is discussion to be had around what the final heap profiler implementation should look like with respect to the new library. That said, we are happy to provide an Evergreen build if you'd like to experiment with the new tcmalloc sampling API! | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||
The function and performance of the heap profiler is very important to support and triage as we use it not infrequently to debug memory allocation issues in the field, so we need to take a close look at this. Do you have an evergreen build we could use for that purpose? | |||||||||||||||||||||
| Comment by Alison Felizzi (Inactive) [ 05/Mar/21 ] | |||||||||||||||||||||
|
Status update: I've since added support for compiling the tcmalloc specific utility interfaces (in src/mongo/util) with the new tcmalloc library. I've additionally implemented a variant of the heap profiler that uses the new tcmalloc API (may still be buggy, currently doing some last validations). Some notes on the current approach:
I'll be further working on cleaning up the branch however this is a good point to start sharing the tcmalloc branch (on my mongo fork) so others can now checkout the work and experiment. The branch is found here: https://github.com/alisonfel/mongo/tree/server-54747-tcmalloc
Note: This work has only been tested on an x86_64 Linux platform (will further work on the ARM64 and PPC platforms). Please let me know if there are any issues with compiling and running, will happily help! | |||||||||||||||||||||
| Comment by Alison Felizzi (Inactive) [ 03/Mar/21 ] | |||||||||||||||||||||
|
Status update: I've now have tcmalloc compiling and linking in with the mongodb server binaries. The current exception is that the build skips over the tcmalloc specific utility interfaces (src/mongo/util). Some of the current limitations and compromises of the tcmalloc build, which will need to be addressed before production, include:
Moving forward, my current focus is trying to compile the tcmalloc specific utility interfaces in src/mongo/util with the new tcmalloc library. I'll be trying to determine what parts of the API and its surrounding options are still compatible. | |||||||||||||||||||||
| Comment by Alison Felizzi (Inactive) [ 01/Mar/21 ] | |||||||||||||||||||||
|
Short status update: I've been focusing on developing a build of the new tcmalloc in the MongoDB repository by adding the necessary SCons support. Besides creating a new SConscript file for the vendored tcmalloc repository, I've needed to patch in additional build support to some of tcmalloc's dependencies. TCMalloc is heavily dependent on abseil-cpp, which we additionally vendor in our third party modules (abseil-cpp-master). Our current SCons build of abseil-cpp only covers a limited set of targets/sources however. The additional components tcmalloc is dependent on includes (in Bazel syntax):
Our current abseil checkout is missing the following targets:
The missing targets seem to be used widespread in tcmalloc (i.e I don't believe it would be worth patching out of tcmalloc's source). In order to properly build tcmalloc we may need to also update our abseil checkout to a more recent revision?
As a temporary solution, I'm going to work on a more recent version of abseil (locally) to continue development. I hope to have an initial build of tcmalloc together soon (1 or 2 days). |