Consider changing index build bucket fill %

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Index Maintenance, MMAPv1
    • None
    • Storage Execution
    • ALL
    • Hide

      repro.py is a short python client to demonstrate how the index size per record count behaves under a load of storing random md5 hashes with the following usage pattern:

      • insert 200k documents (stabilizes around 75B/doc)
      • resize (drops to 50B/doc)
      • Add 10k documents, in 1k increments (Goes up to 96B/doc)
      • repeats the resize/add 10k a few times, to verify it always happens.
      • Add 200k more documents (Stabilizes at a medium value again)

      I included the output of running the repro for 2.6.0-rc2 and 2.4.9.

      Show
      repro.py is a short python client to demonstrate how the index size per record count behaves under a load of storing random md5 hashes with the following usage pattern: insert 200k documents (stabilizes around 75B/doc) resize (drops to 50B/doc) Add 10k documents, in 1k increments (Goes up to 96B/doc) repeats the resize/add 10k a few times, to verify it always happens. Add 200k more documents (Stabilizes at a medium value again) I included the output of running the repro for 2.6.0-rc2 and 2.4.9.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Take an index with keys of near perfect uniform distribution. In the repro, I use an MD5 hash. Perform a reindex on the collection.
      The next many inserts to this collection will cause all the buckets to split, which leads to the index very quickly double in size. This size is often much larger than the index was before reindexing.

      The reindex command should leave the index in a state where it does not almost instantly double in size.

        1. v2.6.0-rc2.log
          3 kB
        2. v2.4.9.log
          3 kB
        3. repro.py
          1 kB

            Assignee:
            [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            Rod Adams (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: