Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13432

Consider changing index build bucket fill %

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Index Maintenance, MMAPv1
    • None
    • Storage Execution
    • ALL
    • Hide

      repro.py is a short python client to demonstrate how the index size per record count behaves under a load of storing random md5 hashes with the following usage pattern:

      • insert 200k documents (stabilizes around 75B/doc)
      • resize (drops to 50B/doc)
      • Add 10k documents, in 1k increments (Goes up to 96B/doc)
      • repeats the resize/add 10k a few times, to verify it always happens.
      • Add 200k more documents (Stabilizes at a medium value again)

      I included the output of running the repro for 2.6.0-rc2 and 2.4.9.

      Show
      repro.py is a short python client to demonstrate how the index size per record count behaves under a load of storing random md5 hashes with the following usage pattern: insert 200k documents (stabilizes around 75B/doc) resize (drops to 50B/doc) Add 10k documents, in 1k increments (Goes up to 96B/doc) repeats the resize/add 10k a few times, to verify it always happens. Add 200k more documents (Stabilizes at a medium value again) I included the output of running the repro for 2.6.0-rc2 and 2.4.9.

    Description

      Take an index with keys of near perfect uniform distribution. In the repro, I use an MD5 hash. Perform a reindex on the collection.
      The next many inserts to this collection will cause all the buckets to split, which leads to the index very quickly double in size. This size is often much larger than the index was before reindexing.

      The reindex command should leave the index in a state where it does not almost instantly double in size.

      Attachments

        1. repro.py
          1 kB
        2. v2.4.9.log
          3 kB
        3. v2.6.0-rc2.log
          3 kB

        Activity

          People

            backlog-server-execution Backlog - Storage Execution Team
            rod.adams@mongodb.com Rod Adams
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: