[SERVER-12614] Proposal for enhancing MongoDB indexes Created: 04/Feb/14  Updated: 06/Dec/22  Resolved: 22/Feb/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: chekkal Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: indexing, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query
Participants:

 Description   

One of the things we learn from "Index Cardinality" video [M101J: MongoDB for Java Developers] is that when a document with multikey index get moved, all of his indexes must be updated as well, which incur a significant overhead.

I've thought would it be possible to somehow bypass this constraint. The obvious solution is to add another level of indirection (this is a famous pattern for solving computer science problems ) and instead of referencing the document directly from the index we create an entity for each document that reference that document and get the indexes to reference that entity, and now when we move the document we only have to modify that entity only (the entity will never move because its BSON shape will always be the same). The problem with this solution of course is that of trading space for performance (indexes also suffer from this problem).

But all hope is not lost; in MongoDB all documents have an immutable _id field which is automatically indexed. Given all this we know that if a document is ever moved its associated _id index will also be updated, so why not just make all the other indexes references the corresponding _id index of the document?

Given this solution the only index that will be ever be updated when a document moves is the _id index.

I want to know if this solution could possibly be implemented in MongoDB or are there some hidden gotchas to it that would make it impractical?



 Comments   
Comment by Asya Kamsky [ 22/Feb/18 ]

Not an issue with WiredTiger.

Comment by Andy Schwerin [ 04/Feb/14 ]

It's feasible, but it makes all reads access the primary index. So, if you want to read a document that you find via a secondary index, you must take that _id and then look it up in the primary index to find the current location. Depending on the application, that might be a good tradeoff or a bad one. Other database systems in the past have used special markers in the old location of records, sometimes called tombstones, to point to new locations. This lets you pay for the indirection only when a document does move, at the cost of needing to periodically clean up the indexes so that you can garbage collect old tombstones.

Generated at Thu Feb 08 03:29:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.