[SERVER-63609] Consider tuning WT tables for clustered collections Created: 14/Feb/22  Updated: 03/Aug/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: clustered_collections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Participants:

 Description   

We have different table configurations for collections and indexes:
Collections:

    // Setting this larger than 10m can hurt latencies and throughput degradation if this
    // is the oplog.  See SERVER-16247
    ss << "memory_page_max=10m,";
    // Choose a higher split percent, since most usage is append only. Allow some space
    // for workloads where updates increase the size of documents.
    ss << "split_pct=90,";
    ss << "leaf_value_max=64MB,";

Indexes:

    // Separate out a prefix and suffix in the default string. User configuration will override
    // values in the prefix, but not values in the suffix.  Page sizes are chosen so that index
    // keys (up to 1024 bytes) will not overflow.
    ss << "type=file,internal_page_max=16k,leaf_page_max=16k,";

It appears that the rationale for using a high split_pct is that it benefits append-only workloads, although 90 is the default, so presumably there is no difference between collections and indexes. Clustered collections are not necessarily append-only, but it seems like 90 is a reasonable default for both.

Raising "internal_page_max" from the default of 4k would reduce tree depth when there are larger keys because more keys can fit on each internal node, so raising this value could benefit clustered collections whose keys are moderately large.



 Comments   
Comment by Irwin Dolobowsky [ 05/Jun/23 ]

connie.chen@mongodb.com I'm working on a doc for this now. Thanks for the heads up.  

Comment by Connie Chen [ 05/Jun/23 ]

irwin.dolobowsky@mongodb.com - putting the "execution-product-sync" label on this as we need product input on what our future investment is on clustered collections

michael.gargiulo@mongodb.com - let us know if we need to prioritize the TS case separately. 

Comment by Alexander Gorrod [ 03/Apr/23 ]

louis.williams@mongodb.com - thanks for the write up - it's a good idea to revisit this.

Regards the 90% fill factor, it's tuned for 32k pages that MongoDB collections use. Since WiredTiger has a 4k allocation size (which is the minimum increment used for our variably sized pages), a 90% fill factor leads to newly created pages having 28k of data - and therefore efficient disk space usage, but enough headroom to account for some updates to the page without splitting.

For indexes, I think we use the same fill factor, but a 16k page size. We could consider changing that, but it's less useful for indexes, since there is more of a random update access pattern there. Which means that pages tend to grow and split over time, so the initial size/layout is less critical for disk usage.

I'd be happy to explore changing internal_page_max - I'm not sure how to go about choosing a better value.

Let me know if the Storage Engines team can provide any additional context to help this review.

Generated at Thu Feb 08 05:58:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.