Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48169

Optimize distinct count to avoid materializing any values outside the index

    XMLWordPrintable

    Details

      Description

      Today we have two separate optimized index scans: COUNT_SCAN and DISTINCT_SCAN. COUNT_SCAN will return simple sentinel values as it scans the index, avoiding the cost of translating the index key format to the format the query plan needs/understands. DISTINCT_SCAN can seek over large sections of the index where the values are identical, but still materializes an object outside the index key for consumption by the query plan. These two optimizations could be combined in the case of a query like

      // Assume index {value1: 1, value2: 1, value3: 1} exists.
      collection.aggregate([
      { $match: { 
          value1: 1, 
          value2: { $gte: new Date(1000) }
      }},
      { $group: { _id: "$value3" } },
      { $count: "distinct" } // field name here doesn't matter
      ])
      

      This would lead to better performance, unclear how much.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-query Backlog - Query Team
              Reporter:
              charlie.swanson Charlie Swanson
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: