Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29419

Dropping unused indexes adversely affects query performance

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.2.11
    • None
    • Environment:
    • ALL
    • Hide

      1. Query a collection's index stats:

      db.coll.aggregate( { $indexStats: {} }, { $project: { _id: 0, name: 1, "accesses.ops": 1, "accesses.since": 1 } } ).pretty()
      

      2. Note which indexes are not in use, indicated by ops of 0 (zero).
      3. Drop the unused indexes.
      4. Note an increase in query response time.
      5. Add the indexes back.
      6. Note query response time returns to nominal levels.
      7. Query a collection's index stats again, note the re-added indexes still should zero ops, while all other indexes' ops counters have increased.

      Show
      1. Query a collection's index stats: db.coll.aggregate( { $indexStats: {} }, { $project: { _id: 0, name: 1, "accesses.ops" : 1, "accesses.since" : 1 } } ).pretty() 2. Note which indexes are not in use, indicated by ops of 0 (zero). 3. Drop the unused indexes. 4. Note an increase in query response time. 5. Add the indexes back. 6. Note query response time returns to nominal levels. 7. Query a collection's index stats again, note the re-added indexes still should zero ops, while all other indexes' ops counters have increased.

      The aggregate framework's $indexStats() operation reported several indexes with zero ops. The indexes were dropped. Immediately afterwards New Relic reported skyrocketing response times, going from a 99 percentile response time of 208 ms to 59,700 ms. All read tickets were in use.

      Adding the indexes back reverted the response time to nominal levels (~200 ms). Requerying the $indexStats() of the collection still showed the indexes were not used, i.e. ops was at zero. Other indexes on the collection showed advancing ops counters.

      Note this happened to a production system under load and took 17 minutes to recover.

            Assignee:
            mark.agarunov Mark Agarunov
            Reporter:
            blunney Bob Lunney
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: