[SERVER-61284] $project to exclude an array is expensive and scales poorly as the number of threads increases Created: 05/Nov/21  Updated: 29/Oct/23  Resolved: 20/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.2, 4.2.17, 4.4.10
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Ivan Fefer
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File project.js    
Issue Links:
Depends
Problem/Incident
Related
related to SERVER-70353 Support fast path projection for excl... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-04-04, QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-03-21, QE 2022-01-24, QE 2022-04-18, QE 2022-05-02, QO 2022-05-16, QO 2022-05-30, QO 2022-06-13, QO 2022-06-27, QE 2022-10-17, QE 2022-10-31
Participants:
Linked BF Score: 35

 Description   

Repro script attached. It creates a document of this form:

    doc = {
        other: "xxxxxxx",
        array: new Array(100000).fill("xxxxxxxxxxxxxx")
    }

Then runs an aggregation that uses $project in one of two ways:

  • exclude the array field using {$project: {array: 0}}, or
  • include only the other field using {$project: {other: 1}}.

The exclude version is much slower and also scales very poorly as the number of threads increases, even though both are computing the same result. Following table shows time per $project operation for the two cases and for single threaded vs multi-threaded, showing both

  • the poor single-threaded performance for the exclusive version, adn
  • the poor scaling; good scaling would have the multi-threaded time per operation approximately the same as the single-threaded case, up to the number of CPUs.

                                      nthreads=1    nthreads=40
    exclusive: {$project: {array: 0}}      18 ms        631 ms
    inclusive: {$project: {other: 1}}       1 ms          2 ms

FTDC data and PMP (stack trace) profiling taken from a customer incident show that the scaling bottleneck is the allocator, so it appears that the exclusive projection is doing a very large number of memory allocations, which could explain both the lower single-threaded performance and the poor scaling. In the multi-threaded case a typical stack for the exclusion project shows it waiting for access to the allocator central cache in this stack:

#0  0x0000561f79a845e6 in base::internal::SpinLockDelay(int volatile*, int, int) ()
#1  0x0000561f79a84413 in SpinLock::SlowLock() ()
#2  0x0000561f79a86495 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) ()
#3  0x0000561f79a902c2 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) ()
#4  0x0000561f798ff9bd in mongo::mongoMalloc(unsigned long) ()
#5  0x0000561f796db81d in mongo::RCString::create(mongo::StringData) ()
#6  0x0000561f796d47f9 in mongo::ValueStorage::putString(mongo::StringData) ()
#7  0x0000561f796d9430 in mongo::Value::Value(mongo::BSONElement const&) ()
#8  0x0000561f796d91a7 in mongo::Value::Value(mongo::BSONElement const&) ()
#9  0x0000561f796c4fbf in mongo::DocumentStorage::constructInCache(mongo::BSONElement const&) ()
#10 0x0000561f796c50c3 in mongo::DocumentStorageIterator::constructInCache() ()
#11 0x0000561f789bf160 in mongo::projection_executor::ProjectionNode::applyProjections(mongo::Document const&, mongo::MutableDocument*) const ()
#12 0x0000561f789c1492 in mongo::projection_executor::ProjectionNode::applyToDocument(mongo::Document const&) const ()
#13 0x0000561f789b9cc5 in mongo::projection_executor::ExclusionProjectionExecutor::applyProjection(mongo::Document const&) const ()
#14 0x0000561f7832d3b9 in mongo::projection_executor::ProjectionExecutor::applyTransformation(mongo::Document const&) ()
#15 0x0000561f787caa88 in mongo::DocumentSourceSingleDocumentTransformation::doGetNext() ()
#16 0x0000561f782db0a1 in mongo::DocumentSource::getNext() ()
#17 0x0000561f787d7edc in mongo::Pipeline::getNext() ()



 Comments   
Comment by Githook User [ 20/Oct/22 ]

Author:

{'name': 'Ivan Fefer', 'email': 'ivan.fefer@mongodb.com', 'username': 'Fefer-Ivan'}

Message: SERVER-61284: Support simple projection optimization for simple exclusion projections
Branch: master
https://github.com/mongodb/mongo/commit/9c348fbfafacc93b06d8d4c5521925c97150089f

Comment by Ivan Fefer [ 03/Oct/22 ]

Projections with exclusions require whole documents:
https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/query/projection.cpp#L212

So they are not considered simple:
https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/query/projection.h#L121

So they are not eligible to all optimizations as stated here: 
https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/query/planner_analysis.cpp#L423

But actually ProjectionNodeSimple works with materialized documents, so it is probably fine to move it out of isSimple check.

Comment by Ivan Fefer [ 03/Oct/22 ]

Looking at implementation of ProjectionStageSimple in Classic engine: https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/exec/projection.cpp#L282

It looks like it can be easy to create a solution that supports exclusion of non-dotted fields as well.

Comment by Ivan Fefer [ 03/Oct/22 ]

Looks like the main reason for the difference is that we have multiple projection implementations: generic one that is slow and faster ones, but they can only be used in specific cases

https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/query/query_solution.h#L851

In case of inclusion, we can use this one:
https://github.com/mongodb/mongo/blob/c8d8c12efb6cf1a009dc2e800ca5879b950d54ea/src/mongo/db/query/query_solution.h#L954

> fast-path for when the projection only has inclusions on non-dotted fields
Because in general we don't have a schema, we can't guarantee that we don't have non-dotted fields, so we can't use this version of projection in case of {array: 0}. 

So the fix for this case will be to add this check, using a schema if it is present of some meta-data, possibly.

Comment by Ivan Fefer [ 03/Oct/22 ]

SBE and Classic both have this issue

Comment by Ivan Fefer [ 03/Oct/22 ]

project.js

Didn't see reproduce script, made my own.
Attaching it just in case.
Problem is reproduced: 

[js_test:project] starting include_other
[js_test:project] include_other: 2ms
...
[js_test:project] starting exclude_array
[js_test:project] exclude_array: 448ms 

Comment by Xiaochen Wu [ 29/Jun/22 ]

Can someone from the eng team to give us a quick ballpark estimation for the fix? david.storch@mongodb.com kyle.suarez@mongodb.com 

Comment by David Storch [ 02/May/22 ]

This ticket got abandoned, so I'm marking for re-triage. CC rushan.chen@mongodb.com

Comment by Bruce Lucas (Inactive) [ 05/Nov/21 ]

Thanks for reminding me to check that. I see the same behavior on 5.0.2, 4.4.10, and 4.2.17. The multi-threaded case is actually a little better on 4.2.17, ~400 ms vs ~600 ms.

Comment by Kyle Suarez [ 05/Nov/21 ]

bruce.lucas, what version of MongoDB was this tested on?

Generated at Thu Feb 08 05:52:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.