Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48067

Reduce memory consumption for unique index builds with large numbers of non-unique keys

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.9, 4.7.0, 4.4.2
    • Affects Version/s: 4.2.0
    • Component/s: None
    • Labels:
    • Fully Compatible
    • v4.4, v4.2
    • Execution Team 2020-07-27
    • 0

      We use a vector to track all duplicate keys inserted on an index. Specifically, this is in the phase when we dump keys from the external sorter into the WT bulk inserter.

      Due to the nature of hybrid index builds, we must track these duplicates until we temporarily stop writes and can see all writes to the table.

      If a collection has a large number of duplicate key violations, this vector can build up without bound. We can improve this behavior by batching writes to reduce the memory impact.

      We should consider using this ticket to also address the conversion of KeyString back to BSONObj to record duplicates and also the memory amplification of creating new vectors to copy key data.

            gregory.noma@mongodb.com Gregory Noma
            louis.williams@mongodb.com Louis Williams
            0 Vote for this issue
            24 Start watching this issue