[SERVER-77942] Performance regressions in $graphLookup due to makePipeline Created: 09/Jun/23 Updated: 18/Jan/24 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qe-perf-90 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Mongo-perf shows +15% in the LookupViaGraphLookup test between 6.0 and 7.0 and I believe it to be caused by various slowdowns under mongo::pipeline::makePipeline. Notably, 7.0 spends more time under mongo::getExecutor and dealing with AutoGetCollectionForReadCommandMaybeLockFree but, unfortunately, there is no single point of regression. https://jira.mongodb.org/browse/BF-28421 concerns some other Lookup related benchmarks but claims them to be not representative of real use cases. I don't think the same reasoning would apply to LookupViaGraphLookup as I can repro the regression on a collection that represents a binary tree via a parent link, trying to output direct children of each node – a typical scenario in hierarchical datasets. https://jira.mongodb.org/browse/BF-28050 is another ticket related to regressions in Lookup. It's not marked as 7.0 blocker (because the regression isn't severe anymore after GraphLookup is sensitive to slowdowns under makePipeline() because it might call the method a lot: (per my observations, I haven't spend much time reading the implementation) as many times as there are matched unique values for the "connectFromField" in local collection plus as many documents in local that don't match to anything. Relevant tests in mongo-perf:
|
| Comments |
| Comment by Irina Yatsenko (Inactive) [ 09/Jun/23 ] |
|
Profiled "for(let i = 0; i < 250; i++) {res = db.l.aggregate([{$graphLookup: {from: "f", startWith: "$fkey", connectFromField: "fkey", connectToField: "_id", as: "match"}}]).toArray();}" (where "l" consists of {_id: i, fkey: i} and f – of {_id: i}, both with 100 docs like in the mongo-perf benchmark, i increments from 0 to 99): 6.0 7.0 |