[SERVER-45308] Alphabetical order of field names used in an $or clause drives evaluation order and thus affects performance Created: 27/Dec/19 Updated: 07/Jan/20 Resolved: 07/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Byrne III | Assignee: | Charlie Swanson |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Sprint: | Query 2020-01-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
| Comments |
| Comment by Charlie Swanson [ 07/Jan/20 ] | |
|
And to further elaborate one point: william.byrne is correct that the query planner sorts the predicates for caching purposes, but one could imagine the planner distinguishing between the "canonical" or caching order of the predicates from the actual execution order. This would certainly not be easy to do, but I think other systems have such analysis so it's a reasonable request. | |
| Comment by Charlie Swanson [ 07/Jan/20 ] | |
|
Ok I'm going to close this as a duplicate of SERVER-45364. I think that's the closest in similarity. As bduncan@visualmining.com points out, there are possible changes the server could take to pick a better/more efficient ordering. This is definitely related to SERVER-37530 as a workaround if the user knows a better order, so I will also link that. | |
| Comment by Charlie Swanson [ 07/Jan/20 ] | |
|
charlie.swanson to read up and make the appropriate number of duplicates. | |
| Comment by Bruce Duncan [ 29/Dec/19 ] | |
|
This issue is based on one of my support cases. I believe this issue can be considered a duplicate of SERVER-37530, just as Though william.byrne says
I would disagree. You are presumably sorting the elements in order to improve the chances of matching a cached query plan where the sorted query itself is the cache key, and then using the sorted query elements in the actual plan. There are two possible approaches I can think of here: 1. Leave the sorting in place, but use ascending operation cost order instead of ascending alphabetical order. william.byrne points out that you do not know if a field is an array or not ahead of time. To me, that is another way of saying that you don't know the cost of the operation for a given field. Mongo could certainly keep a cache of discovered schema information obtained while executing queries (`x` is an array with sizes min-max, `y` is a scalar`, etc), and use it to derive a cost for non-index covered field operations that could then be used for the sorting. 2. Leave the sorting in place, but only for objects, not an arrays (i.e. the values of the `$and` and `$or` operators). Yes, this will increase the number of queries that need to have query plans calculated if the queries contain operators with array values that have varying element orders. However, for customers who can ensure a small cardinality of operator array element orders (such as those generating their queries programmatically) this suggested change would not have a significant negative impact on performance. Others would encounter a performance hit the first time a new formulation of a query is executed, but not on subsequent executions. This is not dissimilar to the hit people already deal with related to warming the cache. |