[SERVER-86248] Better index tie breaking in group distinct scan Created: 05/Feb/24  Updated: 06/Feb/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Shteinfeld Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

In the bestbuy high-value workload distinct_types_with_in_predicate-useAgg, we run an aggregation

db.products.aggregate([{$match: {type: {$in: ["Game", "Movie", "Music", "HardGood"]}}}, {$group: {_id: "$type"}}]) 

which results in a plan

 

 

  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'bestbuy.products',
          indexFilterSet: false,
          parsedQuery: { type: { '$in': [ 'Game', 'HardGood', 'Movie', 'Music' ] } },
          queryHash: '24FB8A6A',
          planCacheKey: '6DBEACC2',
          optimizationTimeMillis: 0,
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            isCached: false,
            stage: 'PROJECTION_COVERED',
            transformBy: { type: 1, _id: 0 },
            inputStage: {
              stage: 'DISTINCT_SCAN',
              keyPattern: { type: 1, subclass: 1, genre: 1, format: 1 },
              indexName: 'type_1_subclass_1_genre_1_format_1',
              isMultiKey: false,
              multiKeyPaths: { type: [], subclass: [], genre: [], format: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                type: [
                  '["Game", "Game"]',
                  '["HardGood", "HardGood"]',
                  '["Movie", "Movie"]',
                  '["Music", "Music"]'
                ],
                subclass: [ '[MinKey, MaxKey]' ],
                genre: [ '[MinKey, MaxKey]' ],
                format: [ '[MinKey, MaxKey]' ]
              }
            }
          },
          rejectedPlans: []
        }
      }
    },
    { '$groupByDistinctScan': { newRoot: { _id: '$type' } } }
  ],

However, there is a shorter index which can satisfy the query that the planner does not consider,

 

 

db.products.aggregate([{$match: {type: {$in: ["Game", "Movie", "Music", "HardGood"]}}}, {$group: {_id: "$type"}}], {hint: {type: 1, name: 1}})

  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'bestbuy.products',
          indexFilterSet: false,
          parsedQuery: { type: { '$in': [ 'Game', 'HardGood', 'Movie', 'Music' ] } },
          queryHash: '24FB8A6A',
          planCacheKey: '6DBEACC2',
          optimizationTimeMillis: 0,
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            isCached: false,
            stage: 'PROJECTION_COVERED',
            transformBy: { type: 1, _id: 0 },
            inputStage: {
              stage: 'DISTINCT_SCAN',
              keyPattern: { type: 1, name: 1 },
              indexName: 'type_1_name_1',
              isMultiKey: false,
              multiKeyPaths: { type: [], name: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                type: [
                  '["Game", "Game"]',
                  '["HardGood", "HardGood"]',
                  '["Movie", "Movie"]',
                  '["Music", "Music"]'
                ],
                name: [ '[MinKey, MaxKey]' ]
              }
            }
          },
          rejectedPlans: []
        }
      }
    },
    { '$groupByDistinctScan': { newRoot: { _id: '$type' } } }
  ], 

We should use the shorter index in this case.


Generated at Thu Feb 08 06:59:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.