[SERVER-31679] Increase in disk i/o for writes to replica set Created: 23/Oct/17 Updated: 30/Oct/23 Resolved: 26/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | 3.6.0-rc0 |
| Fix Version/s: | 3.6.6, 3.7.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Geert Bosch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Storage 2018-02-12, Storage NYC 2018-03-12 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Description |
|
Simple insert loop running against 1-node replica set
Shows a large increase in journal syncs, disk write requests, bytes written. 3.4.9 left, 3.6.0-rc0 right:
It appears we're doing a flush on every update, even though there's no write concern. In the simple repro above this doesn't seem to affect performance (for reasons tbd) but with more complex workloads the additional journal i/o will impact performance. Does not reproduce with standalone mongod. |
| Comments |
| Comment by Githook User [ 22/May/18 ] | |||||||||||||
|
Author: {'username': 'GeertBosch', 'name': 'Geert Bosch', 'email': 'geert@mongodb.com'}Message: (cherry picked from commit f23bcbfa6d08c24b5570b3b29641f96babfc6a34) | |||||||||||||
| Comment by Githook User [ 26/Feb/18 ] | |||||||||||||
|
Author: {'email': 'geert@mongodb.com', 'name': 'Geert Bosch', 'username': 'GeertBosch'}Message: | |||||||||||||
| Comment by Bruce Lucas (Inactive) [ 01/Feb/18 ] | |||||||||||||
|
Following up our discussion today, let's refocus this ticket on disk i/o due to log flushes, which is is a large effect and is in itself a problem for some customers, and leave aside for now issues about impact on throughput, which are more subtle and easily conflated with other 3.6 performance regressions. Here's a multi-threaded version of the simple update workload in the initial comment:
Here are four pairs of runs at 1, 2, 4, and 8 threads, against a 2-node repl set; the first run of each pair is 3.4, the second is 3.6:
I would expect that if we can fix the single-threaded case in the sense of restoring 3.6 to the 3.4 behavior of 30 flushes per second we will have fixed the other cases, and I think we can declare this ticket finished. | |||||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Jan/18 ] | |||||||||||||
|
Note that there are two different workloads on this ticket, one single-threaded and one multi-threaded. I don't know if they are the same issue; if not we should split off the multi-threaded workload into a separate ticket. | |||||||||||||
| Comment by Michael Cahill (Inactive) [ 30/Jan/18 ] | |||||||||||||
|
sue.loverso, can you please investigate, and in particular see whether the changes contemplated in WT-3531 help with bruce.lucas's workload can be made to sync the journal less? | |||||||||||||
| Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ] | |||||||||||||
|
It's difficult to separate out any impact of this issue from the impact of
First two runs are updates without write concern on 3.4.9 then 3.6.0-rc0, showing a performance regression. The second pair of runs are 3.4.9 vs 3.6.0-rc0 with a mix of ordinary and j:true updates:
The second pair with j:true in the mix show a larger performance regression. I suspect that the incremental regression may be due to the impact of the additional syncs on the j:true operations. | |||||||||||||
| Comment by Bruce Lucas (Inactive) [ 23/Oct/17 ] | |||||||||||||
|
I have a multi-threaded workload that includes some j:true ops and shows a decrease in performance. Will post details tomorrow. I think the large increase in i/o could in itself be a problem for some users. | |||||||||||||
| Comment by Eric Milkie [ 23/Oct/17 ] | |||||||||||||
|
This behavior is mostly likely due to WT-3531; we left out an optimization that was in 3.4 because of the difficulty of keeping it in 3.6. Bruce, I would not expect the simple insert loop here to actually complete slower than in 3.4, as the journal flushes are done in a separate thread from the writers, and we still group journal flushes. |