[SERVER-73300] [CQF] Stats generation over dotted path keys does not distinguish between leaf array and array along path Created: 25/Jan/23  Updated: 02/Feb/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nicholas Zolnierz Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Operating System: ALL
Participants:

 Description   

For two documents such asĀ 

{a: {b: [1, 2]}},
{a: [{b: 1}, {b: 2}]}

calculating the histogram stats over a key 'a.b' will not be able to distinguish the two. The counts in the histogram more-or-less represent the first case, and we lose the information that 'a' was an array of objects with 'b' fields. The reason for this is that the stats-generating pipeline attempts to get the value of a particular key via a stage such as {$project: {val: <$path>}} before passing to a group stage, which will traverse arrays and objects but lose any context along that path.

In practice, this shouldn't affect our cardinality estimation for most queries, since typically the path 'a.b' should consider both docs as equivalent for matching purposes. The issue comes with $elemMatch, since a query such as {a.b: {$elemMatch: {$eq: 1}}} should treat the leaf array as a match but not the array of documents.


Generated at Thu Feb 08 06:24:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.