Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9507

Optimize $sort+$group+$first pipeline to avoid full index scan

    XMLWordPrintableJSON

Details

    • Fully Compatible
    • Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08

    Description

      This is an analogue to SERVER-2094 ("distinct cheat with indexes"), but for the aggregation framework.

      This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline.

      For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:

      db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:"$x"},y:{$first:"$y"}}})

      Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index.

      This ticket is filed as a result of discussion in SERVER-9272 (full use case available there).

      Attachments

        Activity

          People

            justin.seyster@mongodb.com Justin Seyster
            backlog-server-query Backlog - Query Team (Inactive)
            Votes:
            9 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: