Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-73300

[CQF] Stats generation over dotted path keys does not distinguish between leaf array and array along path

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Optimization
    • ALL

      For two documents such asĀ 

      {a: {b: [1, 2]}},
      {a: [{b: 1}, {b: 2}]}
      

      calculating the histogram stats over a key 'a.b' will not be able to distinguish the two. The counts in the histogram more-or-less represent the first case, and we lose the information that 'a' was an array of objects with 'b' fields. The reason for this is that the stats-generating pipeline attempts to get the value of a particular key via a stage such as {$project: {val: <$path>}} before passing to a group stage, which will traverse arrays and objects but lose any context along that path.

      In practice, this shouldn't affect our cardinality estimation for most queries, since typically the path 'a.b' should consider both docs as equivalent for matching purposes. The issue comes with $elemMatch, since a query such as {a.b: {$elemMatch: {$eq: 1}}} should treat the leaf array as a match but not the array of documents.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            nicholas.zolnierz@mongodb.com Nicholas Zolnierz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: