[SERVER-37233] Increase in disk i/o for writes to replica set Created: 20/Sep/18  Updated: 27/Oct/23  Resolved: 24/Dec/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Kelsey Schubert
Resolution: Works as Designed Votes: 1
Labels: dmd-perf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File fast.png     PNG File slow.png    
Issue Links:
Depends
Related
related to SERVER-35958 Big CPU load increase (×4) on seconda... Closed
related to SERVER-37695 add debug replOplogJournalDelayMillis... Closed
is related to SERVER-31679 Increase in disk i/o for writes to re... Closed
Operating System: ALL
Sprint: Storage NYC 2018-10-08, Storage NYC 2018-10-22, Storage NYC 2018-11-05
Participants:
Case:

 Description   
Issue Summary as of Dec 23, 2018

ISSUE SUMMARY
Beginning in MongoDB 3.6, administrators may observe an increase in disk i/o on mongod primaries.

ISSUE IMPACT
This increase in disk i/o should not generally be cause for concern.

To ensure writes received by secondaries are durable on the primary, oplog entries are journaled and written to disk before being replicated. As a result, it is beneficial for mongod to flush the journal more rapidly, and consequently more heavily utilize the disk, to make these oplog entries available for replication as soon as possible.

If the disk is fully utilized, the frequency of journal flushes will decrease to provide the same overall throughput of the node as compared to the performance of disk-bound workloads in earlier versions of MongoDB.

AFFECTED VERSIONS
MongoDB 3.6.x and subsequent major releases exhibit this behavior.

Original description

This is a continuation of SERVER-31679. It appears that the fix to that issue reduced the rate of journal flushes, but it is still significantly (about 30x) higher than in 3.4.

Simple insert workload:

function repro() {
    db.c.insert({_id: 0, i: 0})
    for (i = 0; i < 100000; i++) {
        db.c.update({_id: 0}, {$inc: {i: 1}})
    }
}

Results on 3.4.17, 3.6.5, and 3.6.6 respectively:

Note that while SERVER-31679 has reduced the number of journal flush operations significantly between 3.6.5 and 3.6.6, at about 300/s is still about 30x larger than in 3.4.

This becomes clearer if we simulate an application that is doing a low rate of inserts:

function repro() {
    db.c.insert({_id: 0, i: 0})
    for (i = 0; i < 10000; i++) {
        db.c.update({_id: 0}, {$inc: {i: 1}})
        sleep(3)
    }
}

Results on 3.4.17, 3.6.5, and 3.6.6 respectively:

For this workload SERVER-31679 has not made any difference in the number of journal flush operations - it is still about 30x larger than in 3.4.


Generated at Thu Feb 08 04:45:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.