The problem descrption here is:
We've seen a pair of tickets recently (
WT-6924 and a recent HELP ticket) where traffic to the history store increases over time when there are a lot of updates and the durable timestamp is pinned. I.e., when we keep adding things to the history store without aging older things out.
This ticket is a placeholder to capture the fact that this might be an issue. I have a suspicion of why this is happening (below). But I think we'll need more data on use cases before deciding if this is actually a problem. We can't optimize for everything.
Here's my theory about what's happening. When there are a lot of updates, there is a good chance that when WiredTiger reconciles a dirty page it will have updates to multiple keys. and in this scenario WT will add the older values to the history store. Ideally the keys we need to update in the HS will also fall on just one or two pages—since the keys in the history store are typically sorted in the same order as in the active btree.
But over time, more and more updates accumulate in the history store (because we're not moving the durable timestamp). So the amount of space required for each key in the history store goes up. As a result, the number of unique keys stored on each page in the history store file goes down. This means that as the history store grows, reconciliation will need more distinct pages from the history store to update the same set of keys.
To make this concrete, suppose every update requires 100 bytes in the history store, pages are 4KB, and we have a set of sequentially numbered keys from 1 to N. If every key has a single old value stored in the history store, then a page of the history store hold information about 40 keys. So if reconciliation needs to update keys 5, 15, 25, and 35, there is a good chance they are all on the same history store page. If we keep updating all of the keys until there are 10 old values for each key in the history store, then we will have 1000 bytes per key in the history store, and each history store page will have information about only 4 keys. At this point, the same reconciliation that updates keys 5, 15, 25, and 35 will likely need 4 different pages from the history store.