[SERVER-36202] when creating new index, use existing index if possible to avoid collection scan Created: 19/Jul/18  Updated: 04/Dec/23  Resolved: 04/Dec/23

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: pmr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-48697 Create new index base on existing ones Closed
Related
is related to SERVER-3150 Allow creation of sparse index over a... Closed
is related to SERVER-51458 Allow creation of a partial index usi... Backlog
is related to SERVER-51459 Allow creation of index based on seco... Backlog
is related to SERVER-51460 Investigate supporting creation of in... Backlog
is related to SERVER-51461 Support building indexes from existin... Backlog
is related to SERVER-51466 Investigate support for building a mu... Backlog
Assigned Teams:
Storage Execution
Sprint: Execution Team 2020-10-05
Participants:

 Description   

A user pointed out that if there is an index on a:1, b:1  and they specify they want to create a new index b:1 with partialFilterExpression:{a:10} (or any other filter on a then index build could scan the existing index and build new index much faster.

Common use case that can benefit from this may also be a case where there exists an index on a:1, b:1, c:1 and user wants to replace it with index on just a:1, b:1 which again can be build (hopefully) faster from existing index than collection scan.



 Comments   
Comment by Steven Vannelli [ 04/Dec/23 ]

The team discussed this in our Product Sync today and the group is comfortable closing this given the more specific linked tickets that have been created. 

Comment by Louis Williams [ 09/Oct/20 ]

Due to the complexity of this task, I have broken it up into several tickets, linked. This ticket will stay open for watchers, but we are implementing the initial components of this change in SERVER-51461, guarded by a feature flag.

Aside from the filed tickets, SERVER-51461 does not implement yielding, which will need to be addressed.

Comment by Daniel Gottlieb (Inactive) [ 15/Sep/20 ]

The third case should also improve in terms of a smaller memory footprint (avoiding the need to page in full documents).

Comment by Louis Williams [ 15/Sep/20 ]

I want to elaborate on these examples to clarify what would and would not be an improvement if we used an existing index build a new one.

  • Existing index on a:1, b:1, c:1. Creating a new index on a:1, b:1 or a: 1 or a:1, partialFilterExpression: b
    • This is definitely an improvement because we would not need to sort any keys. This eliminates one of the most expensive parts of the index build. I think this is a worthwhile improvement.
  • Existing index on a:1. Creating a new index on b:1, a:1 or b:1 with partialFilterExpression on 'a'
    • This will not be an improvement because we still have to read from the collection and sort keys for 'b'. The addition of reading from the index will only add more time. Remember that an IXSCAN + FETCH is slower than a COLLSCAN
  • Existing index on a:1, b:1. Creating a new index on b:1 or b:1 with partialFilterExpression on 'a'
    • This might be an improvement. We would still need to sort the keys for 'b'. We wouldn't need to scan the collection to generate keys, but we would need scan the index to generate new keys. This improvement would have to come from avoiding key generation from BSON to KeyString, which is very highly optimized. We would have to show that key generation from an existing KeyString(a,b) -> KeyString(b) is significantly faster than BSON -> KeyString.
Generated at Thu Feb 08 04:42:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.