I've looked at what's going on with medium-btree. I don't have the solution (not yet, anyway), but I have some information on what's going on. So I wanted to share with you in case you can kick-off a background thread to give this some brain cycles over the week-end:
So it appears that __wt_row_search is very memory intensive: it generates lots of misses in the last-level cache and goes to memory a lot. As a result it is slow.
LLC MPKI – thousands of last-level cache misses per instruction --> lower is better
IPC – instructions per cycle --> higher is better.
Perf shows 530mln misses per 60 bln instructions for *wt_row_search. That’s the rate of 18 LLC MPKI – huge! For comparison, the LLC miss rate of the readseq LevelDB benchmark (a very fast workload) is only 0.25 MPKI. The IPC of*wt_row_search it is 0.31. For comparison, the IPC of readseq is 1.6!
The cache misses can be isolated to the following two places in the code (they are responsible for 35% and 20% of the runtime respectively) – I am using dev branch #b4c9861 (from May 15):
So in the second case, it's that WT_REF structure you mentioned.