Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-14071

Remove all references to "Indexes should fit in RAM" and similar variants

    XMLWordPrintable

Details

    • Improvement
    • Status: Blocked
    • Major - P3
    • Resolution: Unresolved
    • None
    • None
    • manual, Server
    • None
    • 2
    • ServerDocs2020: Dec15 - Jan 5, ServerDocs2020: Jan5 - Jan12, ServerDocs2020: Jan12 - Jan19, ServerDocs2020: Jan19 - Jan26, ServerDocs2020: Jan26 - Feb2, ServerDocs2020: Feb2 - Feb9, ServerDocs2020: Feb9 - Feb16, ServerDocs2020: Feb16 - Feb23, ServerDocs2020: Feb23 - Mar2, ServerDocs2020: Mar2 - Mar9, ServerDocs2020: Mar9 - Mar16, ServerDocs2020: Mar16 - Mar23, ServerDocs2020: Mar23 - Mar30, ServerDocs2020: Mar30 - Apr06, ServerDocs2020: Apr6 - Apr13
    • true

    Description

      Description

      We have numerous references in our documentation where we advise customers that "indexes should fit in RAM" for optimal performance.

      For example:
      https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/

      Notice we contradict the entire point of the page with the

      and

      https://docs.mongodb.com/manual/applications/indexes/
      Where we say "When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing."

      There are a few problems with making these statements. While it's true that having indexes in RAM does make it so that you don't have to read them from disk THIS IS TRUE FOR ALL DATA USED BY THE APPLICATION. There's nothing special about having "just" indexes in RAM. The more of your "entire database" you have in RAM the better the performance will be.

      The working set is composed of both data and index pages and MongoDB offers no mechanism to pin specific data in cache. When a query executes it must sequentially read the relevant index keys by pulling their pages into cache and then pull the relevant document pages.  This means that the working set needs to have both index pages and data pages for the most accessed data.  We don't store an entire index in RAM... that would be a massive waste of RAM if only a small portion of the index is actually used for data access. 

      This advice ignores how WiredTiger works and gives customers flawed guidance. When customers read these statements they inevitably open support cases where we need to walk them away from what our documentation says because the issue is, ultimately, more nuanced.

      An alternative approach would emphasize the need to manage the working set, describing how indexes and data access occurs in a running query, and how making queries efficient makes performance optimal. I'm happy to assist if we'd like to alter the approach we're taking now.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

      Attachments

        Activity

          People

            Unassigned Unassigned
            shakir.sadikali@mongodb.com Shakir Sadikali
            Shakir Sadikali Shakir Sadikali
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              1 year, 32 weeks, 3 days ago