Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-73315

Guardrail to prevent an accumulator from using too much memory is ineffective when DocumentSourceGroup spills to disk

    • Query Execution
    • Fully Compatible
    • ALL
    • Hide

      I was able to confirm the excessive memory usage against a recent version of master (git revision 7f50a907063adeba488bd5e344dc8b94f3865efd) by first running a server which forces use of the classic query engine:

      ./mongod --setParameter internalQueryFrameworkControl="forceClassicEngine"
      

      Then I ran the following repro script:

      // Create a situation where an individual $push will exceed its memory limit only after spilling.
      (function() {
      "use strict";
      
      const numGroups = 20;
      const docsPerGroup = 50;
      const totalDocs = numGroups * docsPerGroup;
      
      const stringSize = 10 * 1024 * 1024;
      const largeString = (new Array(stringSize)).join("-");
      
      let coll = db.pushSpill;
      coll.drop();
      
      for (let i = 0; i < totalDocs; ++i) {
          let doc = {_id: i, group: i % numGroups, padding: largeString};
          assert.commandWorked(coll.insert(doc));
      }
      print("done inserting data");
      
      let explain =
          coll.explain("executionStats")
              .aggregate([{$group: {_id: "$group", res: {$push: "$padding"}}}], {allowDiskUse: true});
      printjson(explain);
      }());
      

      In the example from this script, there are 20 individual groups. Each consists of 50 strings of ~10MB each. We would expect spilling to kick in several times. During the merging phase, the $push arrays must be concatenated to produce the final result. Only at this point do we expect the 100MB $push limit to be exceeded. As expected, this script produces the following error:

      Error: explain failed: {
      	"ok" : 0,
      	"errmsg" : "$push used too much memory and cannot spill to disk. Memory limit: 104857600 bytes",
      	"code" : 146,
      	"codeName" : "ExceededMemoryLimit"
      } :
      _getErrorWithCode@src/mongo/shell/utils.js:24:13
      throwOrReturn@src/mongo/shell/explainable.js:25:19
      constructor/this.aggregate@src/mongo/shell/explainable.js:125:24
      @help38594_repro.js:23:10
      @help38594_repro.js:25:2
      failed to load: help38594_repro.js
      

      I was able to observe that the memory consumed during this test far exceeded the 100MB limit by applying the following patch:

      diff --git a/src/mongo/db/sorter/sorter.cpp b/src/mongo/db/sorter/sorter.cpp
      index 411c85ffbdb..fae1f8505d9 100644
      --- a/src/mongo/db/sorter/sorter.cpp
      +++ b/src/mongo/db/sorter/sorter.cpp
      @@ -466,6 +466,10 @@ public:
           Data next() {
               verify(_remaining);
      
      +        long long memUsage = estimateMemUsage();
      +        std::cout << "!!! storchprint Sorter merge iterator mem usage estimate: " << memUsage
      +                  << std::endl;
      +
               _remaining--;
      
               if (_positioned) {
      @@ -548,6 +552,16 @@ private:
               const Comparator _comp;
           };
      
      +    long long estimateMemUsage() {
      +        long long memUsage = 0;
      +        for (auto&& heapEntry : _heap) {
      +            auto& data = heapEntry->current();
      +            memUsage += data.first.memUsageForSorter();
      +            memUsage += data.second.memUsageForSorter();
      +        }
      +        return memUsage;
      +    }
      +
           SortOptions _opts;
           unsigned long long _remaining;
           bool _positioned;
      

      This instruments the code for doing the merge-sort to print the memory usage of the merge-sort's heap data structure to the standard output. On my machine, the result was an estimated ~990MB of memory, which exceeds the expected 100MB by nearly a factor of 10.

      Show
      I was able to confirm the excessive memory usage against a recent version of master (git revision 7f50a907063adeba488bd5e344dc8b94f3865efd) by first running a server which forces use of the classic query engine: ./mongod --setParameter internalQueryFrameworkControl="forceClassicEngine" Then I ran the following repro script: // Create a situation where an individual $push will exceed its memory limit only after spilling. (function() { "use strict" ; const numGroups = 20; const docsPerGroup = 50; const totalDocs = numGroups * docsPerGroup; const stringSize = 10 * 1024 * 1024; const largeString = ( new Array(stringSize)).join( "-" ); let coll = db.pushSpill; coll.drop(); for (let i = 0; i < totalDocs; ++i) { let doc = {_id: i, group: i % numGroups, padding: largeString}; assert .commandWorked(coll.insert(doc)); } print( "done inserting data" ); let explain = coll.explain( "executionStats" ) .aggregate([{$group: {_id: "$group" , res: {$push: "$padding" }}}], {allowDiskUse: true }); printjson(explain); }()); In the example from this script, there are 20 individual groups. Each consists of 50 strings of ~10MB each. We would expect spilling to kick in several times. During the merging phase, the $push arrays must be concatenated to produce the final result. Only at this point do we expect the 100MB $push limit to be exceeded. As expected, this script produces the following error: Error: explain failed: { "ok" : 0, "errmsg" : "$push used too much memory and cannot spill to disk. Memory limit: 104857600 bytes", "code" : 146, "codeName" : "ExceededMemoryLimit" } : _getErrorWithCode@src/mongo/shell/utils.js:24:13 throwOrReturn@src/mongo/shell/explainable.js:25:19 constructor/this.aggregate@src/mongo/shell/explainable.js:125:24 @help38594_repro.js:23:10 @help38594_repro.js:25:2 failed to load: help38594_repro.js I was able to observe that the memory consumed during this test far exceeded the 100MB limit by applying the following patch: diff --git a/src/mongo/db/sorter/sorter.cpp b/src/mongo/db/sorter/sorter.cpp index 411c85ffbdb..fae1f8505d9 100644 --- a/src/mongo/db/sorter/sorter.cpp +++ b/src/mongo/db/sorter/sorter.cpp @@ -466,6 +466,10 @@ public: Data next() { verify(_remaining); + long long memUsage = estimateMemUsage(); + std::cout << "!!! storchprint Sorter merge iterator mem usage estimate: " << memUsage + << std::endl; + _remaining--; if (_positioned) { @@ -548,6 +552,16 @@ private: const Comparator _comp; }; + long long estimateMemUsage() { + long long memUsage = 0; + for (auto&& heapEntry : _heap) { + auto& data = heapEntry->current(); + memUsage += data.first.memUsageForSorter(); + memUsage += data.second.memUsageForSorter(); + } + return memUsage; + } + SortOptions _opts; unsigned long long _remaining; bool _positioned; This instruments the code for doing the merge-sort to print the memory usage of the merge-sort's heap data structure to the standard output. On my machine, the result was an estimated ~990MB of memory, which exceeds the expected 100MB by nearly a factor of 10.
    • QE 2023-05-15

      The internal state maintained by some accumulators, in particular $addToSet and $push, can result in a large memory footprint. For $addToSet and $push, the memory footprint grows, potentially without bound, as new elements are added. As a mitigation, we implemented a 100MB per-accumulator memory limit in SERVER-44174. If either $push or $addToSet memory usage exceeds 100MB, the query will simply fail with an ExceededMemoryLimit error. We subsequently made these memory limits configurable in SERVER-44869. They can be controlled with internalQueryMaxAddToSetBytes and internalQueryMaxPushBytes.

      The implementation of these memory limits functions as intended when the entire hash table fits in memory and no spilling is required. However, they are not enforced correctly when DocumentSourceGroup spills to disk. The spilling algorithm used by DocumentSourceGroup is to flush the entire hash table to a flat spill file outside the storage engine whenever the hash table grows sufficiently large. The data is written so that it is sorted by key. This may happen multiple times, resulting in a spill file that has n sorted segments. Once all of the input is consumed, DocumentSourceGroup switches to a streaming phase in which the partial aggregates are merged and returned to the parent stage. This is done by opening an iterator to each of the sorted segments of the spill file and performing a merge-sort.

      The problem is that the memory bounds are checked above the level of the sorter::MergeIterator which is actually performing the merge-sort. If there are n spilled file segments, and each of them has the same key k, then all n (key, value) pairs will be deserialized and stored in memory simultaneously. If the values associated with k are large arrays/sets for $push or $addToSet, then they can cumulatively consume much more than the 100MB limit. Only some time later will the memory usage associated with these n (key, value) pairs be calculated. At this point, the query would fail with ExceededMemoryLimit, but the damage has already been done. We've seen customer environments where this excessive memory usage causes the OS to OOM-kill the mongod process before the query system fails the query.

      A potential solution is to change the implementation of the merge-sort phase to eagerly deserialize the keys, but to only deserialize the associated values one-by-one as they are asked for by the caller. I haven't looked into how difficult this would be to implement, though.

            Assignee:
            amr.elhelw@mongodb.com Amr Elhelw
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: