[SERVER-28889] Partial index shouldn't do fetch on conditions that are true by the definition of the index Created: 20/Apr/17  Updated: 09/Jan/24  Resolved: 28/Apr/23

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc5

Type: Improvement Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Milena Ivanova
Resolution: Fixed Votes: 13
Labels: asya, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-29247 COUNT_SCAN not used for partial indexes Closed
is duplicated by SERVER-42647 Compound partialFilterExpression not ... Closed
is duplicated by SERVER-44836 count_documents with filter does not ... Closed
is duplicated by SERVER-48777 Covered Queries do not work with part... Closed
is duplicated by SERVER-32035 Index doesn't cover a query for a par... Closed
Related
related to SERVER-29247 COUNT_SCAN not used for partial indexes Closed
related to SERVER-55065 Null queries should be covered by ind... Closed
related to SERVER-56020 Unnecessary FETCH stage Closed
related to SERVER-26580 allow using partial index on query wh... Backlog
is related to SERVER-26896 Use knowledge about partial index fil... Backlog
is related to SERVER-75025 [CQF] Add test for optimizer-specific... Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0, v6.3, v6.0, v5.0, v4.4
Sprint: QO 2023-03-06, QO 2023-03-20, QO 2023-04-03, QO 2023-04-17, QO 2023-05-01
Participants:
Case:

 Description   

db.xxx.ensureIndex({x:1},{partialFilterExpression:{bin:true}})

> db.xxx.find({bin:true,x:1}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.xxx",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{"bin" : {"$eq" : true}},
				{"x" : {"$eq" : 1}}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {"bin" : {"$eq" : true}},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {"x" : 1},
				"indexName" : "x_1",
				"isMultiKey" : false,
				"multiKeyPaths" : { "x" : [ ] },
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : { "x" : [ "[1.0, 1.0]" 	] }
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "AD-MAC10G.local",
		"port" : 27017,
		"version" : "3.4.2",
		"gitVersion" : "3f76e40c105fc223b3e5aac3e20dcd026b83b38b"
	},
	"ok" : 1
}

There's no need to do the FETCH with a check on bin:true since that condition is already satisfied by the index partial filterexpression



 Comments   
Comment by Githook User [ 16/Jun/23 ]

Author:

{'name': 'Milena Ivanova', 'email': 'milena.ivanova@mongodb.com', 'username': 'mivanova3'}

Message: SERVER-28889 Partial index shouldn't do fetch on conditions that are true by the definition of the index

(cherry picked from commit c4e7c74c4421f42a3bb23f4247a0d2721384d38d)
Branch: v7.0
https://github.com/mongodb/mongo/commit/bf1e01d882974259df2f8efdd9a075d87801d038

Comment by Githook User [ 27/Apr/23 ]

Author:

{'name': 'Milena Ivanova', 'email': 'milena.ivanova@mongodb.com', 'username': 'mivanova3'}

Message: SERVER-28889 Partial index shouldn't do fetch on conditions that are true by the definition of the index
Branch: master
https://github.com/mongodb/mongo/commit/c4e7c74c4421f42a3bb23f4247a0d2721384d38d

Comment by Linus Unnebäck [ 31/Oct/22 ]

I believe that I'm running into this issue. We have an index defined as follows:

 
db.getCollection('Order').createIndex(
  { restaurantId: 1, price: 1 },
  { name: 'foobar', partialFilterExpression:

{ status: 'paid' }

}
)
 

We then do the following aggregation:

 
db.getCollection('Order').aggregate([
  {
    $match:

{       restaurantId: 'foo',       status: 'paid'     }

  },
  {
    $group:

{       _id: '$restaurantId',       count: \{ '$sum': 1 }

,
      grossAmount: { '$sum': '$price' }
    }
  }
])
 

And the plan looks like this:

 
"winningPlan": {
  "stage": "PROJECTION_SIMPLE",
  "transformBy":

{     "restaurantId": 1,     "price": 1,     "_id": 0   }

,
  "inputStage": {
    "stage": "FETCH",
    "filter": {
      "status":

{         "$eq": "paid"       }

    },
    "inputStage": {
      "stage": "IXSCAN",
      "keyPattern":

{         "restaurantId": 1,         "price": 1       }

,
      "indexName": "api_XU807VP5eJ",
 

A temporary workaround is to remove the `status: 'paid'` match condition, and specify an explicit index with the `hint` parameter. But this seems a bit scary 😅

Comment by Ralf Strobel [ 17/Jul/22 ]

Ah, sorry, I copied the original example without thinking. I can confirm this does work in 4.4...

db.test.ensureIndex({x:1, bin:1},{partialFilterExpression:{bin:true}})
db.test.insertOne({x: 1, bin: true})
db.test.explain().count({x:1, bin: true})
...
winningPlan: 
      { stage: 'COUNT',
        inputStage: 
         { stage: 'COUNT_SCAN',
           keyPattern: { x: 1, bin: 1 },
           ...
           indexBounds: 
            { startKey: { x: 1, bin: true },
              startKeyInclusive: true,
              endKey: { x: 1, bin: true },
              endKeyInclusive: true } } },

Still, likely nothing we would want to do in practice. But I will test it out in 5.0 eventually, where our null use-case should be handled more favorably.

Comment by Asya Kamsky [ 17/Jul/22 ]

That’s doing a FETCH because you’re returning the full document. Note that fetch is not applying a filter to the FETCH.

Comment by Ralf Strobel [ 17/Jul/22 ]

Thank you for the insights Asya. I will have to try this once we are migrated to 5.0.

Speaking for 4.4, I cannot confirm the mentioned workaround is effective either...

db.test.ensureIndex({x:1, bin:1},{partialFilterExpression:{bin:true}})
db.test.insertOne({x: 1, bin: true})
db.test.find({x:1, bin: true}).explain()
...
     winningPlan: 
      { stage: 'FETCH',
        inputStage: 
         { stage: 'IXSCAN',
           keyPattern: { x: 1, bin: 1 },
           ...
           indexBounds: { x: [ '[1, 1]' ], bin: [ '[true, true]' ] } } },

In our use case, we are also explicitly trying not to include the deleted field in all of our indexes, so I don't find this workaround very satisfying even if it does work.

Comment by Asya Kamsky [ 15/Jul/22 ]

The issue described in this ticket has a workaround to avoid FETCH in scenarios where partial filter condition is something other than $exists - the workaround does NOT apply to $exists filter.

Workaround: include the field(s) in partialFilterExpression in the index definition. So for example given in the description, the index definition would become x:1, bin:1 and that would allow document FETCH to be avoided.

Comment by Asya Kamsky [ 15/Jul/22 ]

ralf.strobel  the issue you are observing is not related to partial indexes, but rather null semantics and COUNT and the example you show is specifically fixed in 5.0 by SERVER-55065 and no longer uses FETCH in your example.

		"winningPlan" : {
			"stage" : "COUNT",
			"inputStage" : {
				"stage" : "OR",
				"inputStages" : [
					{
						"stage" : "COUNT_SCAN",
						"keyPattern" : {
							"deleted" : 1,
							"value" : 1
						},
						"indexName" : "deleted_1_value_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"deleted" : [ ],
							"value" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"indexBounds" : {
							"startKey" : {
								"deleted" : undefined,
								"value" : 1
							},
							"startKeyInclusive" : true,
							"endKey" : {
								"deleted" : undefined,
								"value" : 1
							},
							"endKeyInclusive" : true
						}
					},
					{
						"stage" : "COUNT_SCAN",
						"keyPattern" : {
							"deleted" : 1,
							"value" : 1
						},
						"indexName" : "deleted_1_value_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"deleted" : [ ],
							"value" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"indexBounds" : {
							"startKey" : {
								"deleted" : null,
								"value" : 1
							},
							"startKeyInclusive" : true,
							"endKey" : {
								"deleted" : null,
								"value" : 1
							},
							"endKeyInclusive" : true
						}
					}
				]
			}
 

Comment by Ralf Strobel [ 07/Jul/22 ]

Hi, just trying to raise awareness for this issue again, as we are facing more and more problems due to it. We're currently trying to gather regular metrics about our collection sizes, which could be trivially answered by the database if this was implemented. Due to this issue, we instead end up doing expensive index scans and flooding the cache with unnecessary documents, lowering our overall database performance.

EDIT: Interestingly, a compound index also seems to suffer from a similar issue when filtering by null...

db.test.insertMany([ { value: 5 }, { value: 6, deleted: true } ])
db.test.createIndex( { deleted: 1, value: 1 } )

db.test.explain().count({deleted: null, value: 1})
{ stage: 'COUNT',
  inputStage: 
    { stage: 'FETCH',
      filter: { deleted: { '$eq': null } },
      inputStage: 
        { stage: 'IXSCAN',
          keyPattern: { deleted: 1, value: 1 },
          indexBounds: 
               { deleted: [ '[undefined, undefined]', '[null, null]' ],
                 value: [ '[1, 1]' ] } } } },

whereas

db.test.explain().count({deleted: false, value: 1})
{ stage: 'COUNT',
  inputStage: 
   { stage: 'COUNT_SCAN',
     keyPattern: { deleted: 1, value: 1 },
     indexBounds: 
       { startKey: { deleted: false, value: 1 },
         startKeyInclusive: true,
         endKey: { deleted: false, value: 1 },
         endKeyInclusive: true } } },

Comment by Jacob Botuck [ 30/Sep/20 ]

Does this issue also affect deletes?
I have an index on field foo.bar with a partial filter expression on foo: $exists=true

I want to delete all records in the index. I do delete where foo: $exists=true AND foo.bar > MinKey. It is going really slow!

Comment by Ralf Strobel [ 26/Mar/20 ]

We would greatly appreciate the implementation of this improvement. For us and probably many others, this is not a fringe case optimization, but would affect literally every database query...

Consider a common soft-delete mechanism, implemented via a boolean flag field:

db.test.insertMany([ { value: 5 }, { value: 6, deleted: true } ])
db.test.createIndex( { value: 1 }, { partialFilterExpression: { deleted: null } } )

In this environment, every query will contain a respective filter on non-deleted objects:

db.test.find({ value: 5, deleted: null })
db.test.count({ value: 5, deleted: null })
db.test.distinct("value", { deleted: null })

As pointed out by some of the duplicate issues, especially the count and distinct queries will currently perform at sub-optimal performance, as they cannot use the COUNT_SCAN or DISTINCT_SCAN stages directly on the index, but need to fetch the full object data instead.

Comment by Eliott Coyac [ 09/Aug/19 ]

As of now, we can't count the number of elements indexed with a partialFilterExpression on another field without doing a FETCH.

Comment by Kyle Suarez [ 20/Apr/17 ]

Looks like we still do the FETCH even when there's an explicit projection:

> db.xxx.find({bin:true, x:1}, {x:1, _id: 0}).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.xxx",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "" : [
                                {
                                        "bin" : {
                                                "" : true
                                        }
                                },
                                {
                                        "x" : {
                                                "" : 1
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                                "x" : 1,
                                "_id" : 0
                        },
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "bin" : {
                                                "" : true
                                        }
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "x" : 1
                                        },
                                        "indexName" : "x_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                                "x" : [ ]
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : true,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "x" : [
                                                        "[1.0, 1.0]"
                                                ]
                                        }
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "celadon",
                "port" : 27017,
                "version" : "0.0.0",
                "gitVersion" : "unknown"
        },
        "ok" : 1
}

Generated at Thu Feb 08 04:19:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.