[SERVER-49506] Excessive memory being used in pipelines with deeply nested documents Created: 14/Jul/20  Updated: 06/Oct/20  Resolved: 06/Oct/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.2.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Arhelger Assignee: Ian Boros
Resolution: Duplicate Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File out3.png     File out3.svg    
Issue Links:
Duplicate
duplicates SERVER-40317 $facet execution has no limit on how ... Closed
Related
is related to SERVER-40317 $facet execution has no limit on how ... Closed
Operating System: ALL
Sprint: Query 2020-10-05, Query 2020-10-19
Participants:
Case:

 Description   

Included are memory allocations flamegraphs showing a pair of aggregations with deeply nested documents and $redact  using 35GB of memory or 18.5GB each.

I do not have the exact aggregation and example document(s) to reproduce this issue, but I believe deeply nested documents with $redact should show similar issues. 



 Comments   
Comment by Ian Boros [ 06/Oct/20 ]

After discussion with kevin.arhelger we've agreed that this is likely a duplicate of SERVER-40317 so I am closing as such.

Comment by Ian Boros [ 06/Oct/20 ]

Correction to my last comment: the document caching only exists in 4.4 and later, so I would not expect the $redact to actually blow up memory usage in the way I described. The memory usage bug with $facet is the most likely explanation here. As mentioned, see SERVER-40317 for more details.

I was unable to access the logs in the support case as it looks like they've been deleted (attempts to access them result in "The specified key does not exist").

Comment by Ian Boros [ 30/Sep/20 ]

My guess is that the issue here is the combination of $redact, a blocking stage (both sort and group appear after the $redact in the flamegraph), and $facet.

For some context, $redact walks the input document, which brings each field into the Document's cache. It then builds a a new document using MutableDocument which is fully cached. We know from prior experience that the memory usage of a fully cached Document is far more than that of a plain BSON object because of all of the overhead involved in maintaining the Document's structure (pointers to children, the document's hash table, etc). A fully cached document could be 3-4x the size of the BSON object that it represents. Putting a lot of fully cached documents into a blocking stage could certainly cause a lot of memory usage.

I also see in the flame graph use of DocumentSourceFacet ($facet). For a while, $facet did not enforce any limit on its total memory usage. My understanding is that while there were limits about the size of an individual document, $facet would execute each of its sub-pipelines to completion. So as long as each document was under a certain size threshold, memory usage could grow without bound. A fix for this was merged under SERVER-40317, and backported all the way to 3.6.

I'm returning this to "needs scheduling" for discussion.

Generated at Thu Feb 08 05:20:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.