[SERVER-22423] Index pages are preferentially evicted Created: 01/Feb/16 Updated: 06/Dec/22 Resolved: 10/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.9, 3.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Done | Votes: | 2 |
| Labels: | 3.7BackgroundTask, WTplaybook | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
Observed result:
|
| Comments |
| Comment by Eric Milkie [ 10/Jun/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bruce, please reopen this if you suspect this issue still exists in 4.0+. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 20/Oct/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
bruce.lucas this is quite old - have we seen any actual workloads that may be slow at least in part because of this issue? (Also it's not clear that 3.4 has the same issue since it's not mentioned in affected versions). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 14/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
bruce.lucas We have also opened | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 14/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
bruce.lucas, I didn't investigate why we start evicting index pages again at that point. Is there reason to believe the index pages aren't being naturally aged out of the cache? I would expect some index pages to be touched once, and then age out of the cache without getting touched again, is that unexpected in this workload? Also, the index pages are being evicted at about half the average rate of the collection pages (844 vs 313), so index pages are evicted about half as often as collection pages. Let me know if we should pursue that question and I will; otherwise, what's the next step on this ticket? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
keith.bostic, the graph you attached covers quite a bit longer time range than the original graph above, so to facilitate comparison I've added markers to your graph similar to the markers on the original graph:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 07/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
bruce.lucas, we've just merged some changes into the WiredTiger develop branch based on
Can you take a look? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 03/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
david.hows Please add comments to the Python code so that it's easier to understand the intent. It would also be beneficial to review the variable names you used so that are more self-describing. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 03/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks keith.bostic, good catch. I've fixed locally and pushed the changed version up. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 02/Mar/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
david.hows, I think there's a bug in the ex_mongo.py script:
With that change, I see better results:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 11/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've gone and done a comparative reproduction between 3.0 and 3.2. Looking specifically, I can see that there are in both cases more pages being evicted from indexes than collections but the difference is far smaller. In 3.0 we see 2000~ collection pages vs 6600 index pages. This compared with 3.2 where we see 2300 vs 2800 pages. I've also looked into the structure of how the tables generated by the Bruce's shell script far above compare with those generated by the python script above. For the most part the 3.2 tables and WT tables show the same numbers (with small variation) of internal and leaf pages and with some deeper diving into the structures of pages the only real difference between the WT and MongoDB data sets are that there are 26 very small leaf pages in the Mongo index table. These small pages in the Mongo index table are between 1212B and 2669B with between 234 and 478 entries compared with the average of 12280B in size and 2000~ entries for both Mongo and WT index tables. These and some small variation in the number of entries in the larger pages in the MongoDB tables are the only notable differences. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 10/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Have written a WT python script that aims to emulate the workload we are testing. Given the outputs here, I currently believe the issue exists as reported in WT. There is a noticeable difference: within this repro we are reading in significantly less data in the indexes as a raw value. However, when thought of as a % of the size of the tables in question, we are reading a far higher proportion of index data (33%+) as the index table is around 150MB compared to the 1.5GB collection table. Below you cam see how WT reads far more index pages than collection pages. You can also see that the index reads account for far less data, but this does (as mentioned) represent a far higher proportion of the whole collection being read into RAM each second.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 05/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Scratch that, found the difference, had an incorrect leaf page size from an earlier test that wasn't being displayed when I reviewed the data files due to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 04/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've been doing some reproduction and wanted to answer what I could of your initial questions.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 04/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
david.hows, interesting questions are:
If our LRU selection for eviction was perfect, we don't think that page size or number of pages per tree should matter. The intuition is that index pages should stick in cache more because they're hotter – we keep revisiting pages in indexes more often because there are more keys per page. But in practice we only approximate LRU, we don't bump the read generation all that often, we only take a certain number of pages per tree, etc. Any or all of those could be contributing to us making suboptimal choices about which pages to evict. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 02/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
alexander.gorrod, here's the script I used, including data collection.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Gorrod [ 01/Feb/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks bruce.lucas - I'll get the reproducer working and work towards understanding the reason we are selecting pages for eviction poorly. It would be useful if you can add a shell script that generates the workload you describe. |