[SERVER-63710] CSFLE: LeakSanitizer warnings after unloading the dynamic library Created: 16/Feb/22 Updated: 27/Oct/23 Resolved: 24/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Field Level Encryption |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Colby Pike | Assignee: | Backlog - Security Team |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | FLE, csfle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Server Security
|
| Operating System: | ALL |
| Participants: |
| Description |
|
On Linux, if an application dlopens the csfle dynamic library then dlcloses the library, LeakSanitizer later warns about memory leaks during program exit. Omitting the call to dlclose (or opening with RTLD_NODELETE) causes the leaks to vanish. No library APIs need be invoked for the behavior to show. LeakSanitizer is unable to render the symbols, unfortunately, as the dynamic library has been unloaded from the memory mappings before LeakSanitizer catches the issue at program shutdown. Our best guess is that this is most likely leaking in statically constructed objects that are built up during dlopen as part of library initialization, but those same objects are never destroyed as part of dlclose. This has a visible effect in LeakSanitizer, but means that other possible important side-effects of static destructors will also never occur. Possible hint: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71971 which relates to a quirk of ELF and ld-linux when static symbols are annotated with UNIQUE, however no symbols in the library appear to have that annotation. I can verify that the call of dlopen transitively invokes __cxa_atexit to register static destructors, but I do not know why they do not run on dlclose. Because dlopen is smart enough not to allocate duplicate mappings in case of repeated dlopen of the same library, it is safe to never close the library and let the static destructors run at program shutdown. This is a viable workaround unless we need to rely on static destructors running immediately on dlclose. The following code is enough to reproduce the issue:
|
| Comments |
| Comment by Anna Henningsen [ 25/Feb/22 ] | ||||||||
|
mark.benvenuto I get that this is a fairly minor issue, but that doesn’t seem true. On my Linux system (glibc 2.31), I can download the shared library, attach a breakpoint to a static destructor in gdb, and watch it being run as part of the library’s destructors via __cxa_finalize, as it should (complete standalone bash script repro: https://0bin.corp.mongodb.com/paste/toPNRJEj#SOlZIdqnEuz1JRV8QkmNeHqrIWlYQDV01KZgiEuhWSL). That also holds true for other small C++ example shared libraries. By modifying the test code from that script (or from the ticket description) to call __cxa_finalize directly, i.e.
instead of dlclose(handle), one can avoid the problem mentioned in the ticket description here (that leak detection tools have trouble getting proper symbols because dlclose() prevents them from getting them), but still get proper leak detection. E.g. with gcc -o test test.c -ldl && valgrind --leak-check=full --show-leak-kinds=all ./test, one can get results like these:
which point to lines in source like: https://github.com/10gen/mongo/blob/626672c9de5486f48c234b709e019d927a7121b2/src/mongo/base/initializer.cpp#L196 or which are obviously real, if limited, memory leaks. So the tl;dr here is: Static destructors do run, but not all of the mongo codebase is doing RAII for global singletons. That means that there is something actionable here. Currently, on Linux, libmongocrypt will not unload the shared library to avoid memory leaks from repeated loads, on other platforms it will unload the library. I assume that for now, libmongocrypt should not unload the shared library on all platforms to sidestep this problem. Whether this is worth taking action on in the server code is a question that I can’t answer. It’s not obvious to me why the examples above require heap allocation, but I’m sure somebody spent time thinking about that and I don’t really want to spend more time digging into this, since I’m really standing more on the sidelines here than actively being involved anyway. | ||||||||
| Comment by Mark Benvenuto [ 24/Feb/22 ] | ||||||||
|
I do not know of other C++ libraries where this may work. The static destructors are stored in the .dtors section and not being called on dlclose on Linux. | ||||||||
| Comment by Anna Henningsen [ 24/Feb/22 ] | ||||||||
|
mark.benvenuto Do you happen to know why this happens only with this shared library, not others? | ||||||||
| Comment by Mark Benvenuto [ 24/Feb/22 ] | ||||||||
|
As the ticket states, this is unfortunately by design with the way glibc unloads the library and how it does not call static destructors. While the library tries to support unload, there is nothing we can do if the underlying OS dynamic library loader does not call static destructors. |