[SERVER-31679] Increase in disk i/o for writes to replica set Created: 23/Oct/17  Updated: 30/Oct/23  Resolved: 26/Feb/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 3.6.0-rc0
Fix Version/s: 3.6.6, 3.7.3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Geert Bosch
Resolution: Fixed Votes: 0
Labels: SWNA
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File four.png     PNG File io.png     PNG File io.png    
Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-33537 High write IO after 3.6 upgrade Closed
Related
related to SERVER-36221 [3.6] Performance regression on small... Closed
related to SERVER-37233 Increase in disk i/o for writes to re... Closed
is related to SERVER-33812 First initial sync oplog read batch f... Closed
is related to WT-3531 Provide a way to flush the log if com... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Storage 2018-02-12, Storage NYC 2018-03-12
Participants:
Case:

 Description   

Simple insert loop running against 1-node replica set

    db.c.insert({_id: 0, i: 0})
    for (i = 0; i < 100000; i++)
        db.c.update({_id: 0}, {$inc: {i: 1}})

Shows a large increase in journal syncs, disk write requests, bytes written. 3.4.9 left, 3.6.0-rc0 right:

It appears we're doing a flush on every update, even though there's no write concern. In the simple repro above this doesn't seem to affect performance (for reasons tbd) but with more complex workloads the additional journal i/o will impact performance.

Does not reproduce with standalone mongod.



 Comments   
Comment by Githook User [ 22/May/18 ]

Author:

{'username': 'GeertBosch', 'name': 'Geert Bosch', 'email': 'geert@mongodb.com'}

Message: SERVER-31679 Limit journaling frequency without oplog waiters

(cherry picked from commit f23bcbfa6d08c24b5570b3b29641f96babfc6a34)
Branch: v3.6
https://github.com/mongodb/mongo/commit/a5656bc506fcdfe36d44cfb3c17969cbe35695c0

Comment by Githook User [ 26/Feb/18 ]

Author:

{'email': 'geert@mongodb.com', 'name': 'Geert Bosch', 'username': 'GeertBosch'}

Message: SERVER-31679 Limit journaling frequency without oplog waiters
Branch: master
https://github.com/mongodb/mongo/commit/f23bcbfa6d08c24b5570b3b29641f96babfc6a34

Comment by Bruce Lucas (Inactive) [ 01/Feb/18 ]

Following up our discussion today, let's refocus this ticket on disk i/o due to log flushes, which is is a large effect and is in itself a problem for some customers, and leave aside for now issues about impact on throughput, which are more subtle and easily conflated with other 3.6 performance regressions.

Here's a multi-threaded version of the simple update workload in the initial comment:

function repro {
    threads=1
    for t in $(seq $threads); do    
        mongo --eval "
            db.c.insert({_id: $t, i: 0})
            for (i = 0; i < 20000; i++) {
                db.c.update({_id: $t}, {\$inc: {i: 1}})
            }
        " &
    done
    wait
}

Here are four pairs of runs at 1, 2, 4, and 8 threads, against a 2-node repl set; the first run of each pair is 3.4, the second is 3.6:

  • In the single-threaded case, 3.4 (A) does only a constant 30 flushes per second (which is too small to even see at the vertical scale of that chart), whereas 3.6 (B) appears to do one flush per update (i.e. thousands of flushes per second).
  • As the number of threads increases, the disparity decreases, for this particular workload at least; above 8 threads (not shown above) there is rough parity.

I would expect that if we can fix the single-threaded case in the sense of restoring 3.6 to the 3.4 behavior of 30 flushes per second we will have fixed the other cases, and I think we can declare this ticket finished.

Comment by Bruce Lucas (Inactive) [ 30/Jan/18 ]

Note that there are two different workloads on this ticket, one single-threaded and one multi-threaded. I don't know if they are the same issue; if not we should split off the multi-threaded workload into a separate ticket.

Comment by Michael Cahill (Inactive) [ 30/Jan/18 ]

sue.loverso, can you please investigate, and in particular see whether the changes contemplated in WT-3531 help with bruce.lucas's workload can be made to sync the journal less?

Comment by Bruce Lucas (Inactive) [ 24/Oct/17 ]

It's difficult to separate out any impact of this issue from the impact of SERVER-31694 and/or other performance regressions in 3.6.0-rc0, but here's an attempt.

First two runs are updates without write concern on 3.4.9 then 3.6.0-rc0, showing a performance regression.

The second pair of runs are 3.4.9 vs 3.6.0-rc0 with a mix of ordinary and j:true updates:

        threads=20
        for t in $(seq $threads); do    
            @echo mongo --eval "
                db.c.insert({_id: $t, i: 0})
                for (i = 0; i < 100000; i++) {
                    if (i%4 == 0)
                        db.c.update({_id: $t}, {\$inc: {i: 1}}, {writeConcern: {j: true}})
                    else
                        db.c.update({_id: $t}, {\$inc: {i: 1}})
                }
            " &
        done
        wait

The second pair with j:true in the mix show a larger performance regression. I suspect that the incremental regression may be due to the impact of the additional syncs on the j:true operations.

Comment by Bruce Lucas (Inactive) [ 23/Oct/17 ]

I have a multi-threaded workload that includes some j:true ops and shows a decrease in performance. Will post details tomorrow.

I think the large increase in i/o could in itself be a problem for some users.

Comment by Eric Milkie [ 23/Oct/17 ]

This behavior is mostly likely due to WT-3531; we left out an optimization that was in 3.4 because of the difficulty of keeping it in 3.6.
The optimization only applies to single-writer workloads, so I suspect more complex workloads would actually not be affected by this. I also believe that the only significant result of this would be to increase latency of single-stream writer workloads that use a write concern with w:majority, since the optimization results in the ability to overlap the time waiting for journal on the primary with the time waiting for journal on secondaries (since the secondaries can see the oplog sooner; they don't have to wait for the journal flush).

Bruce, I would not expect the simple insert loop here to actually complete slower than in 3.4, as the journal flushes are done in a separate thread from the writers, and we still group journal flushes.

Generated at Thu Feb 08 04:27:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.