Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9507

Optimize $sort+$group+$first pipeline to avoid full index scan

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08
    • Case:

      Description

      This is an analogue to SERVER-2094 ("distinct cheat with indexes"), but for the aggregation framework.

      This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline.

      For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:

      db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:"$x"},y:{$first:"$y"}}})

      Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index.

      This ticket is filed as a result of discussion in SERVER-9272 (full use case available there).

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                9 Vote for this issue
                Watchers:
                24 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: