[SERVER-78580] Improve $project parsing to avoid quadratic behavior Created: 30/Jun/23 Updated: 29/Oct/23 Resolved: 14/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Boros | Assignee: | Matt Boros |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | QE 2023-07-24, QE 2023-08-07, QE 2023-08-21 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 35 | ||||||||||||
| Description |
|
ProjectionPathASTNode getChild performs a linear search through a vector of field names to find the correct child. Because of this search, we see n^2 behavior on the number of fields in the $project. As we add children to this node in addNodeAtPathHelper, we also call getChild to search through everything we've added so far. ProjectionPathASTNode could use a map instead of a vector to avoid this behavior. This ticket should also involve performance testing with a map. If we see a regression, we could consider using a map when there are >50 (or some N) elements. |
| Comments |
| Comment by Githook User [ 07/Aug/23 ] |
|
Author: {'name': 'Matt Boros', 'email': 'matt.boros@mongodb.com', 'username': 'mattBoros'}Message: |
| Comment by Githook User [ 03/Aug/23 ] |
|
Author: {'name': 'Matt Boros', 'email': 'matt.boros@mongodb.com', 'username': 'mattBoros'}Message: |
| Comment by Githook User [ 01/Aug/23 ] |
|
Author: {'name': 'Matthew Boros', 'email': 'mattBoros@users.noreply.github.com', 'username': 'mattBoros'}Message: |
| Comment by Matt Boros [ 27/Jul/23 ] |
|
https://github.com/mongodb/genny/pull/956 for genny workload |
| Comment by Matt Boros [ 19/Jul/23 ] |
|
Decided that |
| Comment by Matt Boros [ 19/Jul/23 ] |
|
My plan for this was to create a generic map class templated on the key/value types and the type of map to use when the number of elements is greater than some threshold. Until that threshold is hit, a vector would be used. This would solve the root problem of this ticket which is ProjectionPathASTNode. It could also be used for However ProjectionPathASTNode needs the inserted data to be kept in order, which is possible with a hash map where the keys are part of a linked list. This is not a data structure provided by the standard library. Our preferred way to get the linked hash map behavior is to use a boost::multi_index_container which does not have the same interface as a regular map, and therefore wouldn't work with the generic container I was thinking of creating. Similarly, the LRU Cache implementation we have maintains its own data structures rather than using a linked hash map. I could I think I will go with #2 but will confirm with someone first. |
| Comment by Matt Boros [ 11/Jul/23 ] |
|
My plan is to use a vector for small projections and when enough nodes are added we switch over to a map. I did some investigation and found n=100 to be a good crossover point. Unfortunately there aren't any jstests in agg/ or core/ that test for this size projection so I'll add one. I'm considering adding a Genny workload as well. Ran a perf patch with workloads we currently have, and as expected it's too noisy to make conclusions. I'll run another patch and test locally to see what's going on. |
| Comment by Matt Boros [ 07/Jul/23 ] |
|
Thought about this and I don't think this ticket belongs in the Bonsai perf project. Will move it out later. |
| Comment by Matt Boros [ 30/Jun/23 ] |
|
For the largest $projects you can create, of the form {$project: {a: 1, b: 1, c: 1, ...}}, this currently takes 30+ minutes to parse. With a proof of concept I have, it takes less than a minute. Proof of concept: https://github.com/10gen/mongo/compare/master...project-parse-fast |