Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9507

Optimize $sort+$group+$first pipeline to avoid full index scan

    • Fully Compatible
    • Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08

      This is an analogue to SERVER-2094 ("distinct cheat with indexes"), but for the aggregation framework.

      This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline.

      For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:

      db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:"$x"},y:{$first:"$y"}}})

      Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index.

      This ticket is filed as a result of discussion in SERVER-9272 (full use case available there).

            Assignee:
            justin.seyster@mongodb.com Justin Seyster
            Reporter:
            backlog-server-query Backlog - Query Team (Inactive)
            Votes:
            9 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated:
              Resolved: