-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Workload Resilience
-
None
-
None
-
None
-
None
-
None
-
None
-
None
According to the C++ standard, it's expected that failing an allocation due to lack of memory calls the new handler. This new handler could try to make more memory available, terminate the program, or throw a std::bad_alloc exception.
Pre MongoDB 8.0, the gperftools/tcmalloc allocator followed this expectation. Failed allocations would call our own installed new handler that would end up intentionally crashing the process.
However, this behavior changed starting in MongoDB 8.0. The new google/tcmalloc allocator explicitly forgoes the standard's expectations. Instead of calling the new handler, the allocator explicitly crashes via calling abort(). This means that the new handler that's been installed is now dead code, in the context of allocating with TCMalloc.
This changed behavior was not accounted for in the allocator upgrade, and as a result this is causing confusion when tracing and trying to understand how the server responds to out-of-memory errors. Fortunately, from an external perspective, the server's ultimate action to out-of-memory errors is the same across releases – the process will crash. But now there may be subtle differences in what's exactly logged before an OOM.
This ticket is to figure out the differences, clear up dead code, enumerate different allocation pathways that may end up choosing different exact steps in response to an OOM, and ultimately document the findings in a discoverable way for future readers.