[SERVER-24375] Deduping in OR, SORT_MERGE, and IXSCAN (multikey case) uses unbounded memory Created: 02/Jun/16  Updated: 13/Oct/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 5
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File heap-profile.png     PNG File memory-growth-calltree.png     File repro.sh    
Issue Links:
Duplicate
is duplicated by SERVER-36087 Executing $text statements in conjunc... Closed
is duplicated by SERVER-123 multi-key _id deduping uses a lot of ... Closed
Related
related to SERVER-82167 $regex may use unbounded cpu and memory Investigating
related to SERVER-20239 Built-in sampling heap profiler Closed
related to SERVER-36794 Non-blocking $text plans with just on... Closed
is related to SERVER-26534 Text search uses excessive memory Backlog
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2022-10-17, QE 2022-10-31, QE 2022-11-14, QE 2022-11-28, QE 2022-12-12, QE 2022-12-26, QE 2023-01-09
Participants:
Case:

 Description   

Each of the stages listed in the title keeps a set of RecordIds; these are used to identify seen documents in order to ensure that we do not return the same document twice to the user. However, this requires memory proportional to the number of documents processed, and nothing is in place to ensure that we do not consume too much. One example of how to reproduce this unbounded memory growth is given below.

  • 40 M documents of the form {x:0, y:0}
  • index on {y:1}
  • query of the following form (full repro script attached)

        q = {$or: [{x: 0, y: 0}, {x: 0, y: 0}]}
        db.c.find(q).hint({y: 1}).sort({z: -1}).limit(30).itcount()

Heap profile call tree shows memory usage by OrStage::work grow to about 1.5 GB as it scans the collection, then drop back to 0 at conclusion of query. Graph in each row shows memory usage for that node and its descendants; second number in each row is max memory in MB for that node.



 Comments   
Comment by Kyle Suarez [ 26/Sep/22 ]

Sending this back to the Query Execution triage queue for re-discussion, and to come up with a SWAG estimate on how much work this would be to implement (in the classic engine). After adding the estimate, we can pass this to the Product team for re-prioritization.

Comment by David Storch [ 21/Aug/18 ]

bruce.lucas points out that there is an opportunity to bound the amount of memory used for deduping specifically for top-k SORT plans. Such plans need only retain the RecordIds of the current top k set for deduping purposes. Right now, the deduping is done by the underlying OR or IXSCAN stage beneath the SORT, which requires us to keep the RecordIds of all documents seen so far, not just the top k.

Comment by Nimesh Shah [ 14/Feb/18 ]

Is there any plan to fix this issue? if so, which version?

Comment by David Storch [ 08/Jun/16 ]

bruce.lucas, I think we should generalize this ticket to include anywhere that we keep a set of RecordIds for deduplication that can grow without bound during the execution of the query. IXSCAN and OR are the places that come to mind where we definitely do this.

Comment by Bruce Lucas (Inactive) [ 08/Jun/16 ]

david.storch, here's a heap profile collected by a customer for a similar but more complex query involving a $or stage. (The simplified query on this ticket came from an attempt to reproduce that customer's issue.) Note that memory is only indirectly allocated by OrStage::work, but is directly allocated instead by IndexScan::work. I think this means we either need to generalize this ticket to cover limiting memory allocated generally by the query engine, or open a separate ticket covering memory allocated by IndexScan::work - what do you think?

Comment by David Storch [ 02/Jun/16 ]

The OR stage keeps a set of RecordIds of unbounded size in order to deduplicate results from its children.

Generated at Thu Feb 08 04:06:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.