[SERVER-63609] Consider tuning WT tables for clustered collections Created: 14/Feb/22 Updated: 03/Aug/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | clustered_collections | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: |
| Description |
|
We have different table configurations for collections and indexes:
It appears that the rationale for using a high split_pct is that it benefits append-only workloads, although 90 is the default, so presumably there is no difference between collections and indexes. Clustered collections are not necessarily append-only, but it seems like 90 is a reasonable default for both. Raising "internal_page_max" from the default of 4k would reduce tree depth when there are larger keys because more keys can fit on each internal node, so raising this value could benefit clustered collections whose keys are moderately large. |
| Comments |
| Comment by Irwin Dolobowsky [ 05/Jun/23 ] |
|
connie.chen@mongodb.com I'm working on a doc for this now. Thanks for the heads up. |
| Comment by Connie Chen [ 05/Jun/23 ] |
|
irwin.dolobowsky@mongodb.com - putting the "execution-product-sync" label on this as we need product input on what our future investment is on clustered collections michael.gargiulo@mongodb.com - let us know if we need to prioritize the TS case separately. |
| Comment by Alexander Gorrod [ 03/Apr/23 ] |
|
louis.williams@mongodb.com - thanks for the write up - it's a good idea to revisit this. Regards the 90% fill factor, it's tuned for 32k pages that MongoDB collections use. Since WiredTiger has a 4k allocation size (which is the minimum increment used for our variably sized pages), a 90% fill factor leads to newly created pages having 28k of data - and therefore efficient disk space usage, but enough headroom to account for some updates to the page without splitting. For indexes, I think we use the same fill factor, but a 16k page size. We could consider changing that, but it's less useful for indexes, since there is more of a random update access pattern there. Which means that pages tend to grow and split over time, so the initial size/layout is less critical for disk usage. I'd be happy to explore changing internal_page_max - I'm not sure how to go about choosing a better value. Let me know if the Storage Engines team can provide any additional context to help this review. |