[SERVER-13432] Consider changing index build bucket fill % Created: 01/Apr/14  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance, MMAPv1
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rod Adams Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File repro.py     Text File v2.4.9.log     Text File v2.6.0-rc2.log    
Issue Links:
Depends
Assigned Teams:
Storage Execution
Operating System: ALL
Steps To Reproduce:

repro.py is a short python client to demonstrate how the index size per record count behaves under a load of storing random md5 hashes with the following usage pattern:

  • insert 200k documents (stabilizes around 75B/doc)
  • resize (drops to 50B/doc)
  • Add 10k documents, in 1k increments (Goes up to 96B/doc)
  • repeats the resize/add 10k a few times, to verify it always happens.
  • Add 200k more documents (Stabilizes at a medium value again)

I included the output of running the repro for 2.6.0-rc2 and 2.4.9.

Participants:

 Description   

Take an index with keys of near perfect uniform distribution. In the repro, I use an MD5 hash. Perform a reindex on the collection.
The next many inserts to this collection will cause all the buckets to split, which leads to the index very quickly double in size. This size is often much larger than the index was before reindexing.

The reindex command should leave the index in a state where it does not almost instantly double in size.



 Comments   
Comment by Eric Milkie [ 25/Jan/17 ]

I believe this request boils down to a change to the foreground index build's bucket fill percentage (by lowering it, or by adding jitter). Is that correct?
Background index builds are equivalent to inserting data records one by one into an empty index, so I believe that if you ran the reindex command for an index that was originally built in the background, you wouldn't see the same growth behavior.
I believe that in earlier versions of MongoDB, the bucket file percentage for foreground index builds was set the same as the bucket split percentage, but that ended up wasting a lot of space in indexes that never added new keys within the current keyspace. The current setting does not waste as much space for such workloads, at the expense of the growth behavior for evenly distributed keys as you have noted in the Description.

Generated at Thu Feb 08 03:31:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.