[DOCS-14071] Remove all references to "Indexes should fit in RAM" and similar variants Created: 23/Dec/20  Updated: 22/Jan/24  Due: 25/Dec/20

Status: Backlog
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Shakir Sadikali Assignee: Unassigned
Resolution: Unresolved Votes: 7
Labels: backlog, proactive, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Participants:
Days since reply: 2 years, 51 weeks, 5 days ago
Epic Link: DOCSP-11702
Story Points: 2

 Description   

Description

We have numerous references in our documentation where we advise customers that "indexes should fit in RAM" for optimal performance.

For example:
https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/

Notice we contradict the entire point of the page with the

and

https://docs.mongodb.com/manual/applications/indexes/
Where we say "When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing."

There are a few problems with making these statements. While it's true that having indexes in RAM does make it so that you don't have to read them from disk THIS IS TRUE FOR ALL DATA USED BY THE APPLICATION. There's nothing special about having "just" indexes in RAM. The more of your "entire database" you have in RAM the better the performance will be.

The working set is composed of both data and index pages and MongoDB offers no mechanism to pin specific data in cache. When a query executes it must sequentially read the relevant index keys by pulling their pages into cache and then pull the relevant document pages.  This means that the working set needs to have both index pages and data pages for the most accessed data.  We don't store an entire index in RAM... that would be a massive waste of RAM if only a small portion of the index is actually used for data access. 

This advice ignores how WiredTiger works and gives customers flawed guidance. When customers read these statements they inevitably open support cases where we need to walk them away from what our documentation says because the issue is, ultimately, more nuanced.

An alternative approach would emphasize the need to manage the working set, describing how indexes and data access occurs in a running query, and how making queries efficient makes performance optimal. I'm happy to assist if we'd like to alter the approach we're taking now.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Asya Kamsky [ 12/Feb/21 ]

I’m more concerned that we say full index should fit in RAM when right balanced indexes perform well with just the hot part of the index in RAM

Generated at Thu Feb 08 08:09:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.