[SERVER-55153] Optimize $size + ExpressionFieldPath Created: 11/Mar/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Participants: |
| Description |
|
Currently when evaluating something like {$size: '$a.b'} on a document like {a: [{b:1},{b:2}, ...]} we will first evaluate the ExpressionFieldPath which will produce an array of the values of b, then we ask for the size of that array. This is wasteful, and is even more wasteful if you consider '$a.b.c.d' where each level is an array of object since only the size of the a array matters in that case. (I'm not arguing that these semantics are good, but that is what we do today). Instead we should have an ExpressionSizeOfFieldPath that just returns the size of the top-level array that ExpressionFieldPath would create, but without making any new arrays. I have a profile where over 20% of the time is spent in the unnecessary ExpressionFieldPath for the simple '$a.b' case. I assume it would be even worse for '$a.b.c.d. |
| Comments |
| Comment by Mathias Stearn [ 11/Mar/21 ] |
|
I'm not sure if this is more of a QO thing since it is an optimization, but I assigned to QE backlog since 90% of the work is writing the ExpressionSizeOfFieldPath, and there is just a tiny bit of code to recognize the pattern and switch to it. Feel free to reassign if my reasoning is flawed. |