[SERVER-45418] DocumentSourceCursor batching memory accounting does not account for empty documents, leads to unbounded memory use for count-like aggregates Created: 08/Jan/20  Updated: 29/Oct/23  Resolved: 19/Feb/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.2.4, 3.6.18, 4.3.4, 4.0.17

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0, v3.6
Steps To Reproduce:

Start a mongod server with the WT cache size configured to 0.2 GB using --wiredTigerCacheSizeGB=0.2. Insert 100,000,000 identical documents, each approximately 300 bytes. Then run the following two queries, and monitor memory consumption:

// This simple collection scan query should warm the cache, and thus should end up resulting in ~0.2GB of memory used.
db.coll.find().itcount();
 
// In contrast, this query also needs to scan the collection. But it ends up using ~1 GB of memory, indicating that the system is unnecessarily consuming lots of memory outside the WT cache.
db.coll.aggregate([{$match: {nonExistent: {$exists: false}}}, {$group: {_id: null, count: {$sum: 1}}}]).toArray();

Sprint: Query 2020-02-24
Participants:

 Description   

During query execution, when documents pass between the PlanStage tree and the pipeline of DocumentSources, they are first buffered in batches using a std::deque by the $cursor stage. The size of the batches is controlled by the internalDocumentSourceCursorBatchSizeBytes setParameter, which defaults to 4MB.

For count-like aggregation queries, this 4MB limit is not respected, leading to unbounded memory consumption. See the repro steps below for an example "count-like" query. In this query, the aggregation pipeline is responsible only for counting documents and does not actually require any of the data fields to be propagated from the PlanStage tree to the DocumentSource pipeline. This is implemented by pushing empty Documents onto the $cursor stage's std::deque. When the memory accounting code attempts to incorporate the size of these empty Documents, it calls Document::getApproximateSize(). This ends up having no effect, because Document::getApproximateSize() returns 0 for empty Documents. As a result, the std::deque of empty Document is allowed to grow without bound. In the repro described below, the deque becomes millions of elements long and consumes close to 1GB of memory.

In order to fix this we could explore a few approaches:

  • Fix the memory accounting code to include the size of the Document itself, not just the DocumentStorage. Also account for any additional memory consumed by the std::deque.
  • Change how count-like aggregates execute to avoid creating a large deque of empty documents. Theoretically, this buffering is unnecessary. We could simply discard a matching document and simultaneously increment the counter inside the $sum accumulator.


 Comments   
Comment by Githook User [ 02/Mar/20 ]

Author:

{'username': 'dstorch', 'name': 'David Storch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-45418 Avoid explicitly batching documents in $cursor for count-like aggregates.

(cherry picked from commit 768e87bbf6213d26f83ad2c526d4aab36e64d185)
Branch: v3.6
https://github.com/mongodb/mongo/commit/bed0e7366eaeaaca150cd66058f37b643ed1c23f

Comment by Githook User [ 27/Feb/20 ]

Author:

{'username': 'dstorch', 'name': 'David Storch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-45418 Avoid explicitly batching documents in $cursor for count-like aggregates.

(cherry picked from commit 7c4676ef0e8e47cf79e10b81f7661f8fbea82cb0)
Branch: v4.0
https://github.com/mongodb/mongo/commit/768e87bbf6213d26f83ad2c526d4aab36e64d185

Comment by Githook User [ 25/Feb/20 ]

Author:

{'username': 'dstorch', 'name': 'David Storch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-45418 Avoid explicitly batching documents in $cursor for count-like aggregates.
Branch: v4.2
https://github.com/mongodb/mongo/commit/7c4676ef0e8e47cf79e10b81f7661f8fbea82cb0

Comment by Githook User [ 19/Feb/20 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-45418 Avoid explicitly batching documents in $cursor for count-like aggregates.
Branch: master
https://github.com/mongodb/mongo/commit/3e8b3ef099e6cfaeecae4e8fa1c8b5662d9bdaed

Generated at Thu Feb 08 05:08:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.