[SERVER-26580] allow using partial index on query where predicate matches only partial filter expression Created: 11/Oct/16  Updated: 30/May/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 5
Labels: QFB, asya
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-31172 index not used when it has partialFil... Closed
Related
is related to SERVER-23808 Implement Index Choice based on Parti... Backlog
is related to SERVER-28889 Partial index shouldn't do fetch on c... Closed
Assigned Teams:
Query Optimization
Sprint: Query 2017-05-08, Query 2017-05-29
Participants:
Case:

 Description   

If I have partial index where partial filter expression is {x:1} it can be used for queries where the only condition is {x:1} (we would scan the entire index but that might be a lot faster than collection scan.

It's currently possible to hint such an index, but it doesn't look like we ever choose or consider it.

Possibly related SERVER-20066



 Comments   
Comment by Eric Maciel [ 30/Mar/23 ]

Is this resolved in any of the later versions of mongo? In particular the issue outlined here severely limits the utility of partialFilterExpression indexes or it requires us to have all of the fields in our partialFilterExpression also included in our index.

https://jira.mongodb.org/browse/SERVER-48777

Comment by Asya Kamsky [ 26/Jun/20 ]

The last example has a workaround - include the field from the partial filter expression in the index as another key. This will allow fully covered plan.

Comment by Thilo Schmalfuß [ 16/Jun/20 ]

I recently stumbled in this problem, too: https://jira.mongodb.org/browse/SERVER-48777

For us this is a critical issue, since it limit the usefulness of partial indices a lot.

If this problem is not considered for development soon, than it should at least be reflected in the documentation:

https://docs.mongodb.com/manual/core/query-optimization/#covered-query
https://docs.mongodb.com/manual/core/index-partial/

Comment by Brad Vogel [ 20/Sep/17 ]

Here's one more where the index is not used:

db.test.insert({a: {b:2}})
db.test.ensureIndex({ 'a.b': 1}, {
  partialFilterExpression: { 'a': { $exists: true } }
})
db.test.find({'a.b': 2}).explain(true)

From SERVER-31172 that is duped to this one

Comment by Kyle Suarez [ 27/Apr/17 ]

There are multiple requests in this ticket related to partial indexes and query planning. For my own sanity, I'm going to list all the scenarios below.

When the collection has a partial index {a: 1} with partialFilterExpression: {b: 5}:

  • (1) Both find and aggregate should use the index if the query is compatible with the partialFilterExpression (rather than using a COLLSCAN).
  • (2) distinct should use a DISTINCT_SCAN over the index if the index (1) starts with the distinct key and (2) has a partialFilterExpression that is compatible with the query (rather than a COLLSCAN).
  • (3) A find with query {b: 5, a: {$gte: 1}} should be covered by the index and not require a FETCH stage.

When the collection also has a compound index {a: 1, b: 1}:

  • (4) aggregate should prefer the (presumably more selective) partial index over the compound index. At present, it doesn't seem like the index is even considered.
  • (5) distinct with distinct key "a" and query {b: 5, a: {$gte: 1}} should consider a DISTINCT_SCAN on the partial index. (Currently, the only candidate plan is a DISTINCT_SCAN on the compound index. Strangely, the problem doesn't manifest if you replace $gte with $exists.)
Comment by Asya Kamsky [ 27/Jan/17 ]

This is important for views where you want to have a single large collection appear as several smaller collections so every query under the hood will start with something like tenant_id='X' and each view is for a specific tenant_id only (which would be easier if we fix SERVER-25023)

Comment by Asya Kamsky [ 05/Jan/17 ]

When I just specify

{b:5}

as condition without sort, neither find nor aggregate considers the partial index at all (which is the original bug, it's not different between agg and find).

But another discrepancy is which index we use whether it's an aggregation or find command (i.e. views vs collection), same two indexes as above, a:1 is partial on b:5 condition and also a regular index a:1, b:1.

command: find          { find: "foo",            filter: { b: 5.0, a: { $exists: true } }, sort: { a: 1.0 }, projection: { _id: 0.0, a: 1.0 } }                   planSummary: IXSCAN { a: 1.0 } keysExamined:5 docsExamined:5 cursorExhausted:1 nreturned:5 reslen:176 0ms
command: aggregate { aggregate: "foo", pipeline: [ { $match: { b: 5.0, a: { $exists: true } } }, { $sort: { a: 1.0 } }, { $project: { _id: 0.0, a: 1.0 } } ], cursor: {} } planSummary: IXSCAN { a: 1.0, b: 1.0 } keysExamined:8 docsExamined:5 cursorExhausted:1 nreturned:5 reslen:176 0ms

These two commands (find and aggregate) are identical in both meaning and results. But aggregate gets the "wrong" index instead of perfectly valid partial index. Is that an instance of agg preferring non-blocking stage (they are both non-blocking though!) so the issue is we don't properly realize partial index is non-blocking?

Note that when I change a:$exists to be $gt:0 it doesn't change index selection, but find uses partial index but not as covered index which may be a separate bug:

command: find         { find: "foo",             filter: { b: 5.0, a: { $gt: 0.0 } }, sort: { a: 1.0 }, projection: { _id: 0.0, a: 1.0 } }                                          planSummary: IXSCAN { a: 1.0 } keysExamined:5 docsExamined:5 fromMultiPlanner:1 cursorExhausted:1 nreturned:5 reslen:176
command: aggregate { aggregate: "foo", pipeline: [ { $match: { b: 5.0, a: { $gt: 0.0 } } }, { $sort: { a: 1.0 } }, { $project: { _id: 0.0, a: 1.0 } } ], cursor: {} } planSummary: IXSCAN { a: 1.0, b: 1.0 } keysExamined:8 docsExamined:0 cursorExhausted:1 nreturned:5 reslen:176

Distinct shows strange behavior of using a:1,b:1 covered even though it's not most efficient, but even with condition on a added, it does not use covered on partial index.

command: distinct { distinct: "foo", key: "a", query: { b: 5.0 } } planSummary: COLLSCAN keysExamined:0 docsExamined:9 numYields:0 reslen:57
command: distinct { distinct: "foo", key: "a", query: { b: 5.0, a: { $exists: true } } } planSummary: IXSCAN { a: 1.0 } keysExamined:5 docsExamined:5 numYields:0 reslen:57
command: distinct { distinct: "foo", key: "a", query: { b: 5.0, a: { $gt: 0.0 } } } planSummary: DISTINCT_SCAN { a: 1.0, b: 1.0 } keysExamined:4 docsExamined:0 numYields:0 reslen:57

Comment by Asya Kamsky [ 05/Jan/17 ]

Here's a case where we don't consider a partial index for distinct even though partial filter matches the query exactly.

db.foo.createIndex({a:1}, {partialFilterExpression:{b:5}});
db.foo.distinct("a", {b:5})

We use a collection scan even though we could use the index. We can use DISTINCT_SCAN same as we would on index b:1,a:1 if it existed.

Generated at Thu Feb 08 04:12:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.