[SERVER-77942] Performance regressions in $graphLookup due to makePipeline Created: 09/Jun/23  Updated: 18/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: qe-perf-90
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2023-06-09-06-45-27-383.png     PNG File image-2023-06-09-06-46-19-142.png    
Issue Links:
Issue split
Related
related to SERVER-81521 Investigate performance regressions i... Backlog
Assigned Teams:
Query Execution
Operating System: ALL
Participants:

 Description   

Mongo-perf shows +15% in the LookupViaGraphLookup test between 6.0 and 7.0 and I believe it to be caused by various slowdowns under mongo::pipeline::makePipeline. Notably, 7.0 spends more time under mongo::getExecutor and dealing with AutoGetCollectionForReadCommandMaybeLockFree but, unfortunately, there is no single point of regression.

https://jira.mongodb.org/browse/BF-28421 concerns some other Lookup related benchmarks but claims them to be not representative of real use cases. I don't think the same reasoning would apply to LookupViaGraphLookup as I can repro the regression on a collection that represents a binary tree via a parent link, trying to output direct children of each node – a typical scenario in hierarchical datasets.

https://jira.mongodb.org/browse/BF-28050 is another ticket related to regressions in Lookup. It's not marked as 7.0 blocker (because the regression isn't severe anymore after SERVER-75853?) and doesn't look like it has been investigated beyond SERVER-75853.

GraphLookup is sensitive to slowdowns under makePipeline() because it might call the method a lot: (per my observations, I haven't spend much time reading the implementation) as many times as there are matched unique values for the "connectFromField" in local collection plus as many documents in local that don't match to anything.

Relevant tests in mongo-perf:

Aggregation.LookupViaGraphLookup
Aggregation.GraphLookupNeighbors
Aggregation.IdentityView.LookupViaGraphLookup
Aggregation.IdentityView.GraphLookupNeighbors
Aggregation.GraphLookupSocialite
Aggregation.IdentityView.GraphLookupSocialite

https://evergreen.mongodb.com/task/sys_perf_6.0_linux_microbenchmarks_standalone_intel.2023_01_aggregation_read_commands_3b9ef88e5a613a7f5c62fb2d0c847482c6a5d85a_23_06_02_14_06_07

https://evergreen.mongodb.com/task/sys_perf_7.0_linux_microbenchmarks_standalone_intel.2023_01_aggregation_read_commands_e54a7ebb88c7c922e9f817e7c23a4b16fcf288af_23_06_02_20_31_12

 



 Comments   
Comment by Irina Yatsenko (Inactive) [ 09/Jun/23 ]

Profiled "for(let i = 0; i < 250; i++) {res = db.l.aggregate([{$graphLookup: {from: "f", startWith: "$fkey", connectFromField: "fkey", connectToField: "_id", as: "match"}}]).toArray();}" (where "l" consists of {_id: i, fkey: i} and f – of {_id: i}, both with 100 docs like in the mongo-perf benchmark, i increments from 0 to 99):

6.0
Samples: 19K of event 'cpu-cycles'
57.89%    57.89%         10727  mongod             [.] mongo::Pipeline::makePipeline
  6.53%     6.53%          1206  mongod              [.] mongo::Pipeline::parseCommon<mongo::BSONObj>
  1.54%     1.54%           284  mongod              [.] mongo::Pipeline::optimizePipeline
  32.70%    32.70%           6065  mongod              [.] mongo::PipelineD::buildInnerQueryExecutorGeneric
    25.22%    25.22%          4681  mongod               [.] mongo::getExecutor
      13.66%    13.66%          2532  mongod             [.] mongo::QueryPlanner::plan
    4.26%     4.26%           788  mongod              [.] mongo::CanonicalQuery::canonicalize
  8.97%     8.97%          1661  mongod              [.] mongo::AutoGetCollectionForReadCommandMaybeLockFree::AutoGetCollectionForReadCommandMaybeLockFree
  2.77%                          mongod              [.] mongo::AutoGetCollectionForReadCommandMaybeLockFree::~AutoGetCollectionForReadCommandMaybeLockFree

7.0
Samples: 25K of event 'cpu-cycles'
59.07%    59.07%         14565  mongod             [.] mongo::Pipeline::makePipeline // +3838 samples
  5.96%     5.96%          1482  mongod              [.] mongo::Pipeline::parseCommon<mongo::BSONObj> // +276 
  1.73%     1.73%           426  mongod              [.] mongo::Pipeline::optimizePipeline // +142 
  32.30%    32.30%          7968  mongod             [.] mongo::PipelineD::buildInnerQueryExecutorGeneric // +1903  
    24.26%    24.26%         5990  mongod              [.] mongo::getExecutor // +1309 
      13.24%    13.24%         3259  mongod              [.] mongo::QueryPlanner::plan // +727 
    4.81%     4.81%          1180  mongod              [.] mongo::CanonicalQuery::canonicalize // +392 
  9.75%     9.75%          2407  mongod              [.] mongo::AutoGetCollectionForReadCommandMaybeLockFree::AutoGetColl… // +746 
  3.56%                          mongod              [.] boost::optional_detail::optional_base<mongo::AutoGetCollectionForReadCommandMaybeLockFree>::destroy_impl

Generated at Thu Feb 08 06:37:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.