[SERVER-36221] [3.6] Performance regression on small updates to large documents Created: 20/Jul/18 Updated: 06/Dec/22 Resolved: 28/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.6.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | dmd-perf | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2018-07-30, Storage NYC 2018-08-13, Storage NYC 2018-08-27, Storage NYC 2018-09-10, Storage NYC 2018-10-22, Storage NYC 2018-11-05 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The update portion of the repro script repro-10MBx2.sh from
This seems to be related to eviction of large pages. Interestingly even though we are doing only about 9 updates per second, we are doing 170 evictions per second. |
| Comments |
| Comment by Kelsey Schubert [ 28/Feb/19 ] | |||||||
|
This regression has no longer reproduces after the changes from | |||||||
| Comment by Bruce Lucas (Inactive) [ 04/Dec/18 ] | |||||||
|
Thanks Michael. That seems consistent with the following metrics for four runs (3.6.5, 3.6.6, 3.6.8, 4.0.2).
I think that the "pages evicted because they exceeded the in-memory maximum count" refers to pages evicted due to exceeding the maximum number of deltas as you suggested. We also see as you suggested the oldest_timestamp being updated less often. Interestingly the rate at which the oldest timestamp is updated is about the same as the rate of log flushes, so perhaps there is something that connects them? The change that caused this regression ( In the 4.0.4 run however we see the rate of log flushes is as high as it was in 3.6.5 before | |||||||
| Comment by Michael Cahill (Inactive) [ 04/Dec/18 ] | |||||||
|
brian.lane pinged me about this ticket. This workload is sensitive to two factors:
The WT code for both of these is effectively the same between v3.6 and v4.0, but both are sensitive to how frequently the oldest_timestamp is updated. That's because WT won't retry eviction for a page if oldest_timestamp hasn't change since last time it tried, and because we keep buffering delta changes if the oldest_timestamp hasn't moved forward. Given this, I suspect some difference in replication is involved, and backporting that change may not be desirable in order to address this performance regression. Instead, we could consider allowing pages containing large documents to grow larger and/or allowing more delta updates to accumulate before making full copies of documents. The former trades larger pages in cache for less frequent eviction, the latter will make some reads slower (because they have to apply more delta updates) in order to make the updates faster. | |||||||
| Comment by Bruce Lucas (Inactive) [ 01/Oct/18 ] | |||||||
|
After a little investigation here's some more information (some of this confirms or repeats comments from above);
So the questions seem to be
| |||||||
| Comment by Bruce Lucas (Inactive) [ 28/Sep/18 ] | |||||||
|
A git bisect identified the following commit as the cause of this regression: a5656bc506fcdfe36d44cfb3c17969cbe35695c0 is the first bad commit | |||||||
| Comment by Bruce Lucas (Inactive) [ 20/Sep/18 ] | |||||||
|
It appears that this regression is fixed in 4.0.2 (but not 3.6.8), and in fact performance is both improved and less erratic, on the two workloads attached to this ticket: repro-10MBx2-regression.sh repro-10MBx2-erratic-1.sh Each chart shows four runs, marked A-D, using 3.6.5, 3.6.6, 3.6.8, and 4.0.2 respectively. | |||||||
| Comment by Benety Goh [ 31/Jul/18 ] | |||||||
|
moving code review from | |||||||
| Comment by Alexander Gorrod [ 26/Jul/18 ] | |||||||
|
I'd start by looking at whether | |||||||
| Comment by Bruce Lucas (Inactive) [ 25/Jul/18 ] | |||||||
|
Attached is a simpler script, repro-10MBx2-regression.sh, that shows the regression in 3.6.6. Also attached stacks-regression.txt which look like they show the update often in __rec_split_write. That script stops one of the secondaries while the updates are being applied. Interestingly the regression disappears if the secondary is not stopped. However, depending on factors like document size and number of documents, the throughput under 3.6.6 even with all members of the replica set in operation can be very erratic, e.g.:
3.6.5 on the left, 3.6.6 on the right. Attached repro script repro-10MBx2-erratic-1.sh and ftdc data erratic-1.tar. |