Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99617

Boost productivity of aggregation DISTINCT_SCAN plans

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • Fully Compatible
    • QO 2025-02-03, QO 2025-02-17
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When we issue a distinct() command, we always try to dedup the results we get back from the cursor inside the command logic, regardless of whether we already have duplicates or not (i.e. we could sometimes just directly return the output of DISTINCT_SCAN). Conversely, when we have an aggregation, the $groupByDistinct rewrite eliminates this excess work when we generate a DISTINCT_SCAN.

      For this reason, we need to prioritize plans that use a DISTINCT_SCAN more highly than plans that don't when we use aggregations than when we use distinct() commands. This is because two plans may have an equivalent productivity measure, but NOT have similar work done on the output of the cursor. We may also want to consider eliminating index scan candidates completely for aggregations.

            Assignee:
            alya.berciu@mongodb.com Alya Berciu
            Reporter:
            alya.berciu@mongodb.com Alya Berciu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              None
              None
              None
              None