Avoid FETCHing to resolve not rooted OR predicate whose information is contained in tail of compound index

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      When there's a query with an OR that is not rooted and the selected index is compound and its tail could be used to resolve such predicate, planner adds a FETCH stage to resolve it instead.

       

      MongoDB Enterprise > db.test.find()
      { "_id" : ObjectId("67e141b8305a95c90411b3c8"), "Status" : 3, "UnitId" : 203 }
      { "_id" : ObjectId("67e141b8305a95c90411b3c9"), "Status" : 3, "UnitId" : 606 }
      
      
      MongoDB Enterprise > db.test.createIndex({Status:1, UnitId: 1})
      {
          "numIndexesBefore" : 1,
          "numIndexesAfter" : 2,
          "createdCollectionAutomatically" : false,
          "ok" : 1
      }
      
      
      MongoDB Enterprise > db.test.find(
      ...   { "Status": NumberInt(3),
      ...     "$or": [
      ...       { "UnitId": { "$gte": 100.0 } },
      ...       { "UnitId": { "$gte": 6.0 } }
      ...     ]
      ...   },
      ...   {
      ...     "_id": 0.0,
      ...     "UnitId": 1.0
      ...   }
      ... ).explain("executionStats").queryPlanner.winningPlan
      
      
      {
          "isCached" : false,
          "stage" : "PROJECTION_SIMPLE",
          "transformBy" : {
              "_id" : 0,
              "UnitId" : 1
          },
          "inputStage" : {
              "stage" : "FETCH",
              "filter" : {
                  "$or" : [
                      {
                          "UnitId" : {
                              "$gte" : 100
                          }
                      },
                      {
                          "UnitId" : {
                              "$gte" : 6
                          }
                      }
                  ]
              },
              "inputStage" : {
                  "stage" : "IXSCAN",
                  "keyPattern" : {
                      "Status" : 1,
                      "UnitId" : 1
                  },
                  "indexName" : "Status_1_UnitId_1",
                  "isMultiKey" : false,
                  "multiKeyPaths" : {
                      "Status" : [ ],
                      "UnitId" : [ ]
                  },
                  "isUnique" : false,
                  "isSparse" : false,
                  "isPartial" : false,
                  "indexVersion" : 2,
                  "direction" : "forward",
                  "indexBounds" : {
                      "Status" : [
                          "[3, 3]"
                      ],
                      "UnitId" : [
                          "[MinKey, MaxKey]"
                      ]
                  }
              }
          }
      } 

      We see that the planner is adding a FETCH stage to resolve the OR condition when it could simply be pushed down to the IXSCAN stage.

       

       

      The reason for this lies in a combination of several things:

      1. plan enumerator is not tagging the OR subtree with a selected index since when it is tagged it is a part of an orPushDown optimization 
      2. It is later processed as a leftover child by QueryPlannerAccess::buildIndexedAnd method since it has no tag and it is impossible, at that point, to know if a tag could possibly exist.
      3. It is not easy to rewrite it like we do when pushing projections beneath sort since we'd need to know if the index would be useful.

      In particular, I've deeply investigated how the plan enumerator works with this query and in total it produces three solutions.

      This is the initial memo

       

      [Node #1]: AND enumstate counter 0
                      choice 0:
                          subnodes:
                          idx[0]
                              pos 1 pred UnitId $gte 100.0
                          orPushdownPred: Status $eq 3            
      [Node #2]: AND enumstate counter 0
                      choice 0:
                          subnodes:
                          idx[0]
                              pos 1 pred UnitId $gte 6.0
                          orPushdownPred: Status $eq 3            
      [Node #3]: ALL OF (lockstep): {
                      totalEnumerated: 0
                      exhaustedLockstepIteration: 0
                      subnodes: [
                          {memoId: 1, iterationCount: 0, maxIterCount: none},
                          {memoId: 2, iterationCount: 0, maxIterCount: none},
                  ]
      [Node #4]: AND enumstate counter 0
                      choice 0:
                          subnodes: 3                
      
                      choice 1:
                          subnodes:
                          idx[0]
                              pos 0 pred Status $eq 3                
             
                      choice 2:
                          subnodes: 3
                          idx[0]
                              pos 0 pred Status $eq 3 

       

      Node 4 is the entry point and we have 3 choices:

      1. Choice 0: Represented by Node 3 (subnodes = 3), that in turn is an OR node composed of nodes 1 and 2 (subnodes.memoId = 1 and 2). Both nodes 1 and 2 have only one choice, that is to push down the StatusId = 3 predicate down to the or branch and leverage the compound index

      Generates the following tagged tree 

      $and
          $or
              UnitId $gte 100.0 || Selected Index #0 pos 1 combine 1
              UnitId $gte 6.0 || Selected Index #0 pos 1 combine 1
          Status $eq 3 || Move to 0 || Selected Index #0 pos 0 combine 1
       || Move to 1 || Selected Index #0 pos 0 combine 1 

      and plan

      PROJ
      ---proj = { _id: false, UnitId: true }
      ---type = DEFAULT
      ---nodeId = 4
      ---fetched = 0
      ---sortedByDiskLoc = 0
      ---providedSorts = {baseSortPattern: {}, ignoredFields: []}
      ---Child:
      ------OR
      ---------nodeId = 3
      ---------fetched = 0
      ---------sortedByDiskLoc = 0
      ---------providedSorts = {baseSortPattern: {}, ignoredFields: []}
      ---------Child 0:
      ------------IXSCAN
      ---------------indexName = Status_1_UnitId_1
      ---------------keyPattern = { Status: 1.0, UnitId: 1.0 }
      ---------------direction = 1
      ---------------bounds = field #0[Status]: [3, 3], field #1[UnitId]: [100.0, inf.0]
      ---------------nodeId = 1
      ---------------fetched = 0
      ---------------sortedByDiskLoc = 0
      ---------------providedSorts = {baseSortPattern: { UnitId: 1 }, ignoredFields: [Status]}---------Child 1:
      ------------IXSCAN
      ---------------indexName = Status_1_UnitId_1
      ---------------keyPattern = { Status: 1.0, UnitId: 1.0 }
      ---------------direction = 1
      ---------------bounds = field #0[Status]: [3, 3], field #1[UnitId]: [6.0, inf.0]
      ---------------nodeId = 2
      ---------------fetched = 0
      ---------------sortedByDiskLoc = 0
      ---------------providedSorts = {baseSortPattern: { UnitId: 1 }, ignoredFields: [Status]} 
      1. Choice 1: Simply leverage the index to fulfil Status = 3

      Generates this tagged tree

      $and || Selected Index #0 pos 0 combine 1
          $or
              UnitId $gte 100.0
              UnitId $gte 6.0
          Status $eq 3 || Selected Index #0 pos 0 combine 1 

      and plan

      PROJ
      ---proj = { _id: false, UnitId: true }
      ---type = SIMPLE_DOC
      ---nodeId = 3
      ---fetched = 1
      ---sortedByDiskLoc = 0
      ---providedSorts = {baseSortPattern: {}, ignoredFields: []}
      ---Child:
      ------FETCH
      ---------filter:
                      $or
                          UnitId $gte 100.0
                          UnitId $gte 6.0
      ---------nodeId = 2
      ---------fetched = 1
      ---------sortedByDiskLoc = 0
      ---------providedSorts = {baseSortPattern: { UnitId: 1 }, ignoredFields: [Status]}
      ---------Child:
      ------------IXSCAN
      ---------------indexName = Status_1_UnitId_1
      ---------------keyPattern = { Status: 1.0, UnitId: 1.0 }
      ---------------direction = 1
      ---------------bounds = field #0[Status]: [3, 3], field #1[UnitId]: [MinKey, MaxKey]
      ---------------nodeId = 1
      ---------------fetched = 0
      ---------------sortedByDiskLoc = 0
      ---------------providedSorts = {baseSortPattern: { UnitId: 1 }, ignoredFields: [Status]} 
      1. Choice 2: Combination of both. Discarded as considered to require index intersection.

      Generates this tagged tree

      $and || Selected Index #0 pos 1 combine 1
          $or || Selected Index #0 pos 1 combine 1
              $and || Selected Index #0 pos 0 combine 1
                  Status $eq 3 || Selected Index #0 pos 0 combine 1
                  UnitId $gte 100.0 || Selected Index #0 pos 1 combine 1
              $and || Selected Index #0 pos 0 combine 1
                  Status $eq 3 || Selected Index #0 pos 0 combine 1
                  UnitId $gte 6.0 || Selected Index #0 pos 1 combine 1 

      And as mentioned fails to build a solution with error: Can't build index intersection solution: AND_SORTED is not possible and AND_HASH is disabled

       

      In this case, we're interested in the second solution where we see that:

      1. The 'OR' branch is not tagged as without the or pushdown cannot be indexed
      2. Since it is not tagged, then it is not easy to determine that it can be resolved using the index.

            Assignee:
            Unassigned
            Reporter:
            Carlos Alonso Pérez
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: