[SERVER-11644] Allow to specify an index bucket split threshold on index creation Created: 08/Nov/13  Updated: 06/Dec/22  Resolved: 23/May/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Storage Execution
Participants:

 Description   

In specific usecases (consider an index on timestamp taken at the moment of document creation) DB designer may have intrinsic knowledge on what the index page density should be and can take the responsibility for balancing conflicting goals of index compactness and page split avoidance.

Mongod is already capable of implementing two different page split strategies: 50% and 90%. So we can just provide means for the user to specify a custom percentage when creating a new index.



 Comments   
Comment by Alexander Gorrod [ 22/May/18 ]

The split percentage isn't as useful in WiredTiger as it is in other btree implementations for a few reasons:
1) WiredTiger uses a copy-on-write methodology so the entire page is rewritten regardless of fill ratio.
2) WiredTiger uses a variable page size on disk, so fill percent doesn't actually have a lot of control over disk-space usage.
3) WiredTiger has completely independent in-memory and on-disk page formats. In-memory pages are generally allowed to grow quite large before being written back to disk (unless there is a lot of cache pressure). It's most common that re-writing an in-memory page results in splits regardless of the original on-disk page fill percentage.
4) Compression adds another layer of complexity to choosing an interesting split percentage. If you choose a lower split percentage, pages will have less data, which generally leads to poorer compression ratios.
5) In use cases where a lower split percentage is appropriate, the application generally gravitates towards that fill ratio anyway. The first time a page is written the configured split percentage is used, but once the amount of content grows above the capacity of a single page it will be split into two pages that have relatively low split percentages. A concrete example: A page is written to disk with 90 records, that make it 90% of 32k (28k). An additional 20 records are added to the page, and then it is written to disk again. There will be 110 records, which would be 110% of a single page, which will be split into two pages that have 55 records each (are 55% full).

Given the above I think it's going to be more confusing than helpful to expose a way for applications to configure a split strategy with WiredTiger.

Generated at Thu Feb 08 03:26:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.