Priority: Major - P3
Resolution: Works as Designed
Affects Version/s: None
Fix Version/s: None
Sprint:Storage - Ra 2020-10-19, Storage - Ra 2020-11-02, Storage - Ra 2020-11-16
In a recent support ticket, Bruce Lucas found a scenario with a replica set member down where we see rapid growth in the history store compared with the lookaside file in 4.2 (~2.5 GB vs ~15 GB). For full context, read the description and comments in the linked ticket.
When observing statistics, we noticed:
- Stalling during checkpoint.
- Cache usage spiking during checkpoint.
- History store file taking up lots more disk space.
As of this moment, I think that the large history store file can explain the other two symptoms so we're more interested in diagnosing that. Lots of history store pages will mean that it can dominate the cache and since the history store file is last in line to get checkpointed, those pages will be pinned to the cache during checkpoint leading to high cache usage and stalling due to application threads being tasked with eviction.
In the repro, we are creating a record with a 100-byte string and two integers and modifying the two integers repeatedly. From a WiredTiger point of view, this is going to look like lots of tiny modify updates.
One theory is that since the records are small and we're storing a time window for each one in the history store, this cell metadata may dominate the disk usage. I added some logging in the cell packing code and found that about 4.5 GB of uncompressed unpacked integers were being written to cells which doesn't seem to be enough to account for the difference in disk usage.
When pages are evicted in between checkpoints, their disk image is added to the data file meaning they can exist twice. Since lookaside wasn't checkpointed, this can't happen. In the absolute worst case (which can't happen with our write pattern), we could get a 2x increase which is still not enough.
I added some logging to print the number of checkpoints in the system in __ckpt_process but I only ever saw 0-1 printed which seems to indicate that this is not the case. Also, I am seeing blocks constantly being freed which shouldn't be the case if we're keeping checkpoints around like this.
This one still seems suspicious but I haven't seen much evidence to confirm it.
You'll probably have to change some paths depending on where your mongo repo is in order to get this working.