Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38099

Multiple index filters may be necessary to work on aggregations

    • Query Execution
    • Fully Compatible
    • ALL

      When an index filter is created, it may not be applied when running an aggregation. This is because as part of running an aggregation we may consider multiple query-layer plans, each of which have different shapes, and index filters are only applied at the query layer. For example, we will first try to produce a plan using the projection from the aggregation's dependencies. If that doesn't work, we'll try again with no projection. This means that there are two possible query plans that the aggregation system will use.

      This is a nuisance for users, since it requires them to know what query-level plan the aggregation may use.

      Furthermore, if the aggregation cannot use one query-level plan, it may fall back to another. In this situation, multiple index filters are necessary for the correct index to be used.

      Here's an example showing how it's necessary to create two index filters before it's applied:

      // Insert some data.
      db.names.insert({"name":"miguel", "age":25, "somethingelse":1});
      
      // Create some indexes.
      assert.commandWorked(db.names.createIndex({"name":1}));
      assert.commandWorked(db.names.createIndex({"name":1,"age":1}));
      assert.commandWorked(db.names.createIndex({"name":1,"age":1,"somethingelse":1}));
      
      print("Running explain without index filter");
      printjson(db.names.explain().aggregate( [{ $match: { 'name': "miguel", 'somethingelse': 1 }}, { $group: { _id: { '_name': "$name" },  mycount: { $sum: 1 } } }]));
      
      // Add index filter without projection.
      print("Added index filter with no projection");
      assert.commandWorked(db.runCommand({
          planCacheSetFilter: "names",
          query : {
              
                  "$and" : [                      
                     { 'name': "miguel" }, 
                   {   'somethingelse': 1 }
                  ]
         },
          "sort" : {},
          "projection" : {},
          "indexes" : [ "name_1_age_1" ]
      }));
      
      
      print("Explain now that index filter is present");
      printjson(db.names.explain().aggregate( [{ $match: { 'name': "miguel", 'somethingelse': 1 }}, { $group: { _id: { '_name': "$name" },  mycount: { $sum: 1 } } }]));
      // The index filter had no effect! The index name_1_age_1_somethingelse_1 is still used!
      
      // Now create another index filter.
      print("Add index filter w/ projection");
      assert.commandWorked(db.runCommand({
          planCacheSetFilter: "names",
          query : {
              
                  "$and" : [                      
                     { 'name': "miguel" }, 
                   {   'somethingelse': 1 }
                  ]
         },
          "sort" : {},
          "projection" : { "name" : 1, "_id" : 0},
          "indexes" : [ "name_1_age_1" ]
      }));
      
      print("Explain now that 2nd index filter is present");
      printjson(db.names.explain().aggregate( [{ $match: { 'name': "miguel", 'somethingelse': 1 }}, { $group: { _id: { '_name': "$name" },  mycount: { $sum: 1 } } }]));
      // This time the index name_1_age_1 was actually used.
      
      // Now remove the first index filter we created.
      print("remove index filter which has no projection specified");
      assert.commandWorked(db.runCommand({
          planCacheClearFilters: "names",
          query : {
              
                  "$and" : [                      
                     { 'name': "miguel" }, 
                   {   'somethingelse': 1 }
                  ]
         },
          "sort" : {},
          "projection" : {},
          "indexes" : ["name_1_age_1"]
      }));
      
      print("Running explain again");
      printjson(db.names.explain().aggregate( [{ $match: { 'name': "miguel", 'somethingelse': 1 }}, { $group: { _id: { '_name': "$name" },  mycount: { $sum: 1 } } }]));
      // The planner again uses the "wrong" index, name_1_age_1_somethingelse_1
      

      Part of this problem could be fixed if we pushed down projections into the query layer. If that were done, the aggregation layer would only need to create an executor once (rather than twice, also here). Then, the user would only need to create one index filter, though they would still need to know about which query-level plan the agg system uses.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            ian.boros@mongodb.com Ian Boros
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: