Priority: Major - P3
Resolution: Works as Designed
Affects Version/s: None
Fix Version/s: None
I talked with Martin Bligh this morning, and I've misunderstood how tcmalloc works internally: while tcmalloc doesn't round up to a power-of-two, it does round up to a bucket size.
The current tcmalloc bucket sizes are 32, 48, 64, 80, 96, 128, 160.
The WT_PAGE structure is currently 64B (wasting 0B), the WT_PAGE_MODIFY structure is currently 104B (wasting 24B). If we merge them and get rid of 8B, we hit the 160 bucket exactly with one fewer allocations per page-in-memory.
That hurts read-only applications (unless we do something tricky and only allocate the smaller structure in that case, which I probably wouldn't try to do absent a strong reason), but having one fewer allocations per page-in-memory, with less wasted space, should be a win for the typical MongoDB application.
Alex, you had concerns about the fragility of this change. If the 8B we need to get back from the merged structure comes from removing WT_PAGE.modify, the change is going to be invasive, lots of lines of code will change. That said, I believe the change unlikely to introduce problems because we're switching from a structure that may or may not be in memory to one that is always in memory.
We might be able to keep WT_PAGE.modify and still get the merged structure down to 160B, but I haven't tried yet. If we can do that, the code changes become trivial.
Martin notes: tweaking the tcmalloc buckets isn't trivial, creating a separate 104B bucket isn't necessarily easy to do.