Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29419

Dropping unused indexes adversely affects query performance

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: 3.2.11
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      1. Query a collection's index stats:

      db.coll.aggregate( { $indexStats: {} }, { $project: { _id: 0, name: 1, "accesses.ops": 1, "accesses.since": 1 } } ).pretty()
      

      2. Note which indexes are not in use, indicated by ops of 0 (zero).
      3. Drop the unused indexes.
      4. Note an increase in query response time.
      5. Add the indexes back.
      6. Note query response time returns to nominal levels.
      7. Query a collection's index stats again, note the re-added indexes still should zero ops, while all other indexes' ops counters have increased.

      Show
      1. Query a collection's index stats: db.coll.aggregate( { $indexStats: {} }, { $project: { _id: 0, name: 1, "accesses.ops": 1, "accesses.since": 1 } } ).pretty() 2. Note which indexes are not in use, indicated by ops of 0 (zero). 3. Drop the unused indexes. 4. Note an increase in query response time. 5. Add the indexes back. 6. Note query response time returns to nominal levels. 7. Query a collection's index stats again, note the re-added indexes still should zero ops, while all other indexes' ops counters have increased.

      Description

      The aggregate framework's $indexStats() operation reported several indexes with zero ops. The indexes were dropped. Immediately afterwards New Relic reported skyrocketing response times, going from a 99 percentile response time of 208 ms to 59,700 ms. All read tickets were in use.

      Adding the indexes back reverted the response time to nominal levels (~200 ms). Requerying the $indexStats() of the collection still showed the indexes were not used, i.e. ops was at zero. Other indexes on the collection showed advancing ops counters.

      Note this happened to a production system under load and took 17 minutes to recover.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mark.agarunov Mark Agarunov
              Reporter:
              blunney Bob Lunney
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: