[SERVER-22834] 1.7x performance regression in random queries Created: 24/Feb/16 Updated: 24/Jun/16 Resolved: 24/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Susan LoVerso |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
It appears that a higher rate of pages walked per page evicted could be a significant factor, so possibly this is related to There is also a much higher rate of mutex calls. |
| Comments |
| Comment by Ramon Fernandez Marina [ 21/Jun/16 ] | ||||||||||||||||||||||||
|
The work to address this performance regression was done as part of the linked WT tickets, merged into MongoDB as of 3.3.6. EDIT: I've re-classified the resolution on this ticket as "Fixed", since many users will search the SERVER project when having issues, and I think is better to indicate that this problem was fixed even if the actual code changes happened on the linked WT tickets. | ||||||||||||||||||||||||
| Comment by Susan LoVerso [ 19/May/16 ] | ||||||||||||||||||||||||
|
I reran this script comparing the mongodb-3.0 WT branch with all the recent eviction changes on WT develop (changeset 88801726). I compared the inserts (not talked earlier about but interesting) and the query rates. For the query rates, I took the middle 90 seconds of throughput output from mongostat from the query-only workload. What I see is that current develop is slightly below 3.0 but much, much more stable. For inserts:
For queries (90 seconds) you can see the tight stable range for develop:
| ||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 02/May/16 ] | ||||||||||||||||||||||||
|
Pretty sure the repro script can be found in this comment which is from | ||||||||||||||||||||||||
| Comment by Susan LoVerso [ 02/May/16 ] | ||||||||||||||||||||||||
|
bruce.lucas or david.hows can one of you post the test script to the ticket? I'd like to test with the recent eviction fix. I think it will help this workload and I'd like to run comparisons. | ||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 19/Apr/16 ] | ||||||||||||||||||||||||
|
sue.loverso, once the dust has settled on the eviction work you are currently doing, can you please go back to this MongoDB workload and compare the performance of MongoDB 3.0 with MongoDB master + WT develop? | ||||||||||||||||||||||||
| Comment by David Hows [ 18/Mar/16 ] | ||||||||||||||||||||||||
|
Following up on the work from the other day I have begun looking to see if there are any common factors amongst the pages being evicted by WT. From what I have gathered sofar the most common pages evicted are index pages, this is not unexpected as there are far fewer index pages, meaning that we can expect to see more repetition. The interesting part is that the most popular of these pages are also the smallest. In the last several runs the most evicted 3 pages are all at a minimum 5x smaller than the average index page size. (5K pages vs 20K average). A follow up test that moved the index page size to 32K from the default of 16K showed the same behaviour, with an average page size of 40K~ and the most commonly evicted pages being 5K followed by 6K for the second most commonly evicted page. Still more work to do here, want to understand why these small index pages seem to be in such high demand. | ||||||||||||||||||||||||
| Comment by David Hows [ 17/Mar/16 ] | ||||||||||||||||||||||||
Its a pointer to a WT_PAGE object
These are pages picked as eviction candidates. I'm keying on the verbose message at evict_lru.c:1404
I'm doing a tree review now, want to confirm that the trees are roughly the same. | ||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 17/Mar/16 ] | ||||||||||||||||||||||||
|
Are those hex values pointers to something (like a WT_REF)? What does it mean for there to be one or two pages being evicted this frequently – is this a count of successful evictions or just attempts? Lastly, since we are just talking about 1-2 pages, can you see where they are in the tree somehow? | ||||||||||||||||||||||||
| Comment by David Hows [ 16/Mar/16 ] | ||||||||||||||||||||||||
|
I have done a further dive after my last disastrous conclusion. Having run 3.0 and 3.2 with some extra debug flags I can see the following: Average size of page evicted vs number of pages evicted
Top 10 pages evicted by count*
New
| ||||||||||||||||||||||||
| Comment by David Hows [ 11/Mar/16 ] | ||||||||||||||||||||||||
|
Trying to get some diagnostic data out as to what differs in how the data is processed in WT between versions. I accidentally linked MongoDB 3.0 against a compile of WT taken from the head of develop. This STILL showed the same performance degradation. So i suspect that the issue here is outside WT itself. Will follow this up further soon. | ||||||||||||||||||||||||
| Comment by David Hows [ 02/Mar/16 ] | ||||||||||||||||||||||||
|
I've gone and reviewed these changes and found that MongoDB master still suffers a major regression. MongoDB 3.0.9 throughput 27923.7 vs. MongoDB Master 19358.4. Will chase further tomorrow. Please ignore my earlier version of this comment as I found a bug in my version of this script. | ||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 01/Mar/16 ] | ||||||||||||||||||||||||
|
david.hows, can you please repro bruce.lucas's work, and also test against the tip of 3.2 and master to see whether recent eviction work has had any impact? |