[SERVER-12614] Proposal for enhancing MongoDB indexes Created: 04/Feb/14 Updated: 06/Dec/22 Resolved: 22/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | chekkal | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | indexing, performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query
|
| Participants: |
| Description |
|
One of the things we learn from "Index Cardinality" video [M101J: MongoDB for Java Developers] is that when a document with multikey index get moved, all of his indexes must be updated as well, which incur a significant overhead. I've thought would it be possible to somehow bypass this constraint. The obvious solution is to add another level of indirection (this is a famous pattern for solving computer science problems But all hope is not lost; in MongoDB all documents have an immutable _id field which is automatically indexed. Given all this we know that if a document is ever moved its associated _id index will also be updated, so why not just make all the other indexes references the corresponding _id index of the document? Given this solution the only index that will be ever be updated when a document moves is the _id index. I want to know if this solution could possibly be implemented in MongoDB or are there some hidden gotchas to it that would make it impractical? |
| Comments |
| Comment by Asya Kamsky [ 22/Feb/18 ] |
|
Not an issue with WiredTiger. |
| Comment by Andy Schwerin [ 04/Feb/14 ] |
|
It's feasible, but it makes all reads access the primary index. So, if you want to read a document that you find via a secondary index, you must take that _id and then look it up in the primary index to find the current location. Depending on the application, that might be a good tradeoff or a bad one. Other database systems in the past have used special markers in the old location of records, sometimes called tombstones, to point to new locations. This lets you pay for the indirection only when a document does move, at the cost of needing to periodically clean up the indexes so that you can garbage collect old tombstones. |