[SERVER-61284] $project to exclude an array is expensive and scales poorly as the number of threads increases Created: 05/Nov/21 Updated: 29/Oct/23 Resolved: 20/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.2, 4.2.17, 4.4.10 |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Ivan Fefer |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-04-04, QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-03-21, QE 2022-01-24, QE 2022-04-18, QE 2022-05-02, QO 2022-05-16, QO 2022-05-30, QO 2022-06-13, QO 2022-06-27, QE 2022-10-17, QE 2022-10-31 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 35 | ||||||||||||||||
| Description |
|
Repro script attached. It creates a document of this form:
Then runs an aggregation that uses $project in one of two ways:
The exclude version is much slower and also scales very poorly as the number of threads increases, even though both are computing the same result. Following table shows time per $project operation for the two cases and for single threaded vs multi-threaded, showing both
FTDC data and PMP (stack trace) profiling taken from a customer incident show that the scaling bottleneck is the allocator, so it appears that the exclusive projection is doing a very large number of memory allocations, which could explain both the lower single-threaded performance and the poor scaling. In the multi-threaded case a typical stack for the exclusion project shows it waiting for access to the allocator central cache in this stack:
|
| Comments |
| Comment by Githook User [ 20/Oct/22 ] | |||||
|
Author: {'name': 'Ivan Fefer', 'email': 'ivan.fefer@mongodb.com', 'username': 'Fefer-Ivan'}Message: | |||||
| Comment by Ivan Fefer [ 03/Oct/22 ] | |||||
|
Projections with exclusions require whole documents: So they are not considered simple: So they are not eligible to all optimizations as stated here: But actually ProjectionNodeSimple works with materialized documents, so it is probably fine to move it out of isSimple check. | |||||
| Comment by Ivan Fefer [ 03/Oct/22 ] | |||||
|
Looking at implementation of ProjectionStageSimple in Classic engine: https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/exec/projection.cpp#L282 It looks like it can be easy to create a solution that supports exclusion of non-dotted fields as well. | |||||
| Comment by Ivan Fefer [ 03/Oct/22 ] | |||||
|
Looks like the main reason for the difference is that we have multiple projection implementations: generic one that is slow and faster ones, but they can only be used in specific cases In case of inclusion, we can use this one: > fast-path for when the projection only has inclusions on non-dotted fields So the fix for this case will be to add this check, using a schema if it is present of some meta-data, possibly. | |||||
| Comment by Ivan Fefer [ 03/Oct/22 ] | |||||
|
SBE and Classic both have this issue | |||||
| Comment by Ivan Fefer [ 03/Oct/22 ] | |||||
|
Didn't see reproduce script, made my own.
| |||||
| Comment by Xiaochen Wu [ 29/Jun/22 ] | |||||
|
Can someone from the eng team to give us a quick ballpark estimation for the fix? david.storch@mongodb.com kyle.suarez@mongodb.com | |||||
| Comment by David Storch [ 02/May/22 ] | |||||
|
This ticket got abandoned, so I'm marking for re-triage. CC rushan.chen@mongodb.com | |||||
| Comment by Bruce Lucas (Inactive) [ 05/Nov/21 ] | |||||
|
Thanks for reminding me to check that. I see the same behavior on 5.0.2, 4.4.10, and 4.2.17. The multi-threaded case is actually a little better on 4.2.17, ~400 ms vs ~600 ms. | |||||
| Comment by Kyle Suarez [ 05/Nov/21 ] | |||||
|
bruce.lucas, what version of MongoDB was this tested on? |