Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45418

DocumentSourceCursor batching memory accounting does not account for empty documents, leads to unbounded memory use for count-like aggregates

    • Fully Compatible
    • ALL
    • v4.2, v4.0, v3.6
    • Hide

      Start a mongod server with the WT cache size configured to 0.2 GB using --wiredTigerCacheSizeGB=0.2. Insert 100,000,000 identical documents, each approximately 300 bytes. Then run the following two queries, and monitor memory consumption:

      // This simple collection scan query should warm the cache, and thus should end up resulting in ~0.2GB of memory used.
      db.coll.find().itcount();
      
      // In contrast, this query also needs to scan the collection. But it ends up using ~1 GB of memory, indicating that the system is unnecessarily consuming lots of memory outside the WT cache.
      db.coll.aggregate([{$match: {nonExistent: {$exists: false}}}, {$group: {_id: null, count: {$sum: 1}}}]).toArray();
      
      Show
      Start a mongod server with the WT cache size configured to 0.2 GB using --wiredTigerCacheSizeGB=0.2 . Insert 100,000,000 identical documents, each approximately 300 bytes. Then run the following two queries, and monitor memory consumption: // This simple collection scan query should warm the cache, and thus should end up resulting in ~0.2GB of memory used. db.coll.find().itcount(); // In contrast, this query also needs to scan the collection. But it ends up using ~1 GB of memory, indicating that the system is unnecessarily consuming lots of memory outside the WT cache. db.coll.aggregate([{$match: {nonExistent: {$exists: false }}}, {$group: {_id: null , count: {$sum: 1}}}]).toArray();
    • Query 2020-02-24

      During query execution, when documents pass between the PlanStage tree and the pipeline of DocumentSources, they are first buffered in batches using a std::deque by the $cursor stage. The size of the batches is controlled by the internalDocumentSourceCursorBatchSizeBytes setParameter, which defaults to 4MB.

      For count-like aggregation queries, this 4MB limit is not respected, leading to unbounded memory consumption. See the repro steps below for an example "count-like" query. In this query, the aggregation pipeline is responsible only for counting documents and does not actually require any of the data fields to be propagated from the PlanStage tree to the DocumentSource pipeline. This is implemented by pushing empty Documents onto the $cursor stage's std::deque. When the memory accounting code attempts to incorporate the size of these empty Documents, it calls Document::getApproximateSize(). This ends up having no effect, because Document::getApproximateSize() returns 0 for empty Documents. As a result, the std::deque of empty Document is allowed to grow without bound. In the repro described below, the deque becomes millions of elements long and consumes close to 1GB of memory.

      In order to fix this we could explore a few approaches:

      • Fix the memory accounting code to include the size of the Document itself, not just the DocumentStorage. Also account for any additional memory consumed by the std::deque.
      • Change how count-like aggregates execute to avoid creating a large deque of empty documents. Theoretically, this buffering is unnecessary. We could simply discard a matching document and simultaneously increment the counter inside the $sum accumulator.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: