-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
None
-
Query Execution
-
Fully Compatible
-
ALL
-
-
QE 2023-05-15
The internal state maintained by some accumulators, in particular $addToSet and $push, can result in a large memory footprint. For $addToSet and $push, the memory footprint grows, potentially without bound, as new elements are added. As a mitigation, we implemented a 100MB per-accumulator memory limit in SERVER-44174. If either $push or $addToSet memory usage exceeds 100MB, the query will simply fail with an ExceededMemoryLimit error. We subsequently made these memory limits configurable in SERVER-44869. They can be controlled with internalQueryMaxAddToSetBytes and internalQueryMaxPushBytes.
The implementation of these memory limits functions as intended when the entire hash table fits in memory and no spilling is required. However, they are not enforced correctly when DocumentSourceGroup spills to disk. The spilling algorithm used by DocumentSourceGroup is to flush the entire hash table to a flat spill file outside the storage engine whenever the hash table grows sufficiently large. The data is written so that it is sorted by key. This may happen multiple times, resulting in a spill file that has n sorted segments. Once all of the input is consumed, DocumentSourceGroup switches to a streaming phase in which the partial aggregates are merged and returned to the parent stage. This is done by opening an iterator to each of the sorted segments of the spill file and performing a merge-sort.
The problem is that the memory bounds are checked above the level of the sorter::MergeIterator which is actually performing the merge-sort. If there are n spilled file segments, and each of them has the same key k, then all n (key, value) pairs will be deserialized and stored in memory simultaneously. If the values associated with k are large arrays/sets for $push or $addToSet, then they can cumulatively consume much more than the 100MB limit. Only some time later will the memory usage associated with these n (key, value) pairs be calculated. At this point, the query would fail with ExceededMemoryLimit, but the damage has already been done. We've seen customer environments where this excessive memory usage causes the OS to OOM-kill the mongod process before the query system fails the query.
A potential solution is to change the implementation of the merge-sort phase to eagerly deserialize the keys, but to only deserialize the associated values one-by-one as they are asked for by the caller. I haven't looked into how difficult this would be to implement, though.
- is related to
-
SERVER-44174 $push and $addToSet should restrict memory usage
- Closed
-
SERVER-44869 Add query knob to control memory limit for $push and $addToSet
- Closed