[SERVER-27115] Track fields renamed by $project in aggregation for index consideration Created: 18/Nov/16 Updated: 12/Dec/22 Resolved: 30/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chris Harris | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | bi-performance, neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2017-03-27, Query 2017-04-17, Query 2017-05-08, Query 2017-05-29, Query 2017-06-19 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Simply renaming a field in a $project currently disqualifies it from index consideration. The database should track simple field name changes in order to preserve index options. This is particularly relevant since Implementation of this feature would allow for more flexibility in how logically equivalent aggregation pipelines are written while optimal performance is maintained. |
| Comments |
| Comment by Githook User [ 30/May/17 ] | ||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | ||||||||
| Comment by David Storch [ 23/May/17 ] | ||||||||
|
asya, I've found another example in which swapping is incorrect when the renamed-from path is dotted. This time the match expression which causes the problem is simply testing equality to a scalar---no $elemMatch or comparison to an array required:
In this case, the swap causes us to return additional results rather than causing us to miss results. I don't think it's safe to perform the swap when the renamed-from path is dotted unless we could somehow know that there are no arrays along that path. | ||||||||
| Comment by Githook User [ 19/May/17 ] | ||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | ||||||||
| Comment by David Storch [ 20/Apr/17 ] | ||||||||
|
Remaining tasks:
| ||||||||
| Comment by Githook User [ 18/Apr/17 ] | ||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: If a field renamed by $project or $addFields is used in a | ||||||||
| Comment by David Storch [ 13/Apr/17 ] | ||||||||
|
asya, got it. I think it should be possible to implement this optimization in some cases for renames of dotted paths, with some additional checks related to whether or not the $match involves arrays. May current plan is to first implement the optimization without support for dotted paths, and then extend to allow dotted paths in certain cases. | ||||||||
| Comment by Asya Kamsky [ 12/Apr/17 ] | ||||||||
This is matching a literal array. so I guess dotted notation and any sort of array matching is not compatible. My concern is that a common pattern is renaming dotted fields to remove the dot for compatibility with third party tools - i.e.
| ||||||||
| Comment by Githook User [ 12/Apr/17 ] | ||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | ||||||||
| Comment by David Storch [ 11/Apr/17 ] | ||||||||
|
asya, good question. It is true that $elemMatch predicates have never been able to participate in our $match splitting/swapping optimizations. However, I think this applies to dotted paths regardless of whether or not there is an $elemMatch predicate. This example, similar to the one above, should show why:
| ||||||||
| Comment by Asya Kamsky [ 11/Apr/17 ] | ||||||||
|
Isn't it combination of arrays and $elemMatch in $match that's not optimizable? Wouldn't non-array matching operators work as expected? | ||||||||
| Comment by David Storch [ 11/Apr/17 ] | ||||||||
|
This optimization is not correct when the field path being renamed is dotted. Consider the following collection:
Suppose we have a pipeline which "renames" the path "a.b" to "c.d". The semantics of the $project stage for arrays mean that this is not merely a rename, but rather a reshaping of the document:
Now consider the same pipeline, where the $project is followed by a $match on the newly created path "c.d":
On the surface, it may look correct to rewrite this to a $match followed by a $project, where the $match is rewritten to be on path "a.b" instead of path "c.d". However, this would be an error, since it would cause us to miss the matching document:
This limitation is due to the presence of arrays along the renamed path. If there are no arrays along the renamed path, then I believe the optimization is always valid. However, aggregation cannot know a priori which field paths may contain arrays. Although |