[SERVER-55153] Optimize $size + ExpressionFieldPath Created: 11/Mar/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

Currently when evaluating something like {$size: '$a.b'} on a document like {a: [{b:1},{b:2}, ...]} we will first evaluate the ExpressionFieldPath which will produce an array of the values of b, then we ask for the size of that array. This is wasteful, and is even more wasteful if you consider '$a.b.c.d' where each level is an array of object since only the size of the a array matters in that case. (I'm not arguing that these semantics are good, but that is what we do today). Instead we should have an ExpressionSizeOfFieldPath that just returns the size of the top-level array that ExpressionFieldPath would create, but without making any new arrays.

I have a profile where over 20% of the time is spent in the unnecessary ExpressionFieldPath for the simple '$a.b' case. I assume it would be even worse for '$a.b.c.d.



 Comments   
Comment by Mathias Stearn [ 11/Mar/21 ]

I'm not sure if this is more of a QO thing since it is an optimization, but I assigned to QE backlog since 90% of the work is writing the ExpressionSizeOfFieldPath, and there is just a tiny bit of code to recognize the pattern and switch to it. Feel free to reassign if my reasoning is flawed.

Generated at Thu Feb 08 05:35:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.