[SERVER-19445] Performance regression in Mongo-Perf mms tests with wiredTiger Created: 16/Jul/15  Updated: 05/Feb/16  Resolved: 20/Jul/15

Status: Closed
Project: Core Server
Component/s: Performance, WiredTiger
Affects Version/s: 3.1.5
Fix Version/s: 3.1.6

Type: Bug Priority: Major - P3
Reporter: David Daly Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: mpreg
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot.png    
Issue Links:
Depends
is depended on by WT-1973 MongoDB changes for WiredTiger 2.7.0 Closed
is depended on by SERVER-19744 WiredTiger changes for MongoDB 3.0.6 Closed
Related
related to SERVER-20193 Performance regression in Single-thre... Closed
related to SERVER-20195 Performance regression in WiredTiger ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

In the shell

var setupMMS = function( collection ) {
    collection.drop();
 
    var base = { _id: 0, a: 0, h: {}, z: 0 };
    for (var i = 0; i < 24; i++) {
        base.h[i] = {};
        for (var j = 0; j < 60; j++) {
            base.h[i][j] = { n: 0, t: 0, v: 0 };
        }
    }
    collection.insert(base);
};
use test0
setupMMS(db.Update_MmsIncShallow10)
benchRun({"ops":[{"op":"update","query":{"_id":0},"update":{"$inc":{"a":1}},"ns":"test0.Update_MmsIncShallow10","safe":false,"w":0,"j":false,"writeCmd":true}],"seconds":10,"host":"127.0.0.1:27017","parallel":1})

The update line in the output shows the update throughput.

Participants:

 Description   

Running mongo-perf test Update.MmsIncShallow1 and the other mms tests shows a large regression (~90%) from master to 3.0.4. On evergreen, 3.1.5 gets 1100 ops/s and 3.0.4 gets 11,434.

The following data captures both runs. The first run (A to B) is with 3.0.4. The second run (C to D) is with 3.1.5.

In particular, there appears to be a lot of eviction traffic in the second case, but not the first. The test is 5 seconds long and the server was started immediately before the test.



 Comments   
Comment by Michael Cahill (Inactive) [ 17/Jul/15 ]

For posterity, what was going on in this workload was single-threaded updates to a single document. The document was moderately large, so if WiredTiger accumulated enough versions of the document, it would hit the 10MB limit on the size of a page in memory and forcibly evict the page.

There was a change in the drop for 3.1.5 that made WiredTiger lazier about updating the oldest transaction ID in the system. This value determines how many version of the document we think need to stay in cache. We used to update that value actively every time a transaction started, but some workloads go faster if regular operations don't update that shared state.

The unfortunate consequence of that change was to trigger eviction in this workload – the oldest ID was lagging far enough behind that we kept around enough versions of the document to reach the 10MB limit. The solution committed into WiredTiger develop is to bump that oldest transaction ID when a transaction commits, if it thought it was the oldest one running. This isn't exact, but particularly for single-threaded workloads it should have the effect of having the oldest ID track the current ID much more closely, which means we throw old versions of the document away much more quickly and never cause unnecessary eviction.

Comment by Githook User [ 17/Jul/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-19445 Have the oldest transaction update the oldest tracked ID.

This will help keep the oldest tracked transation ID from falling too far
behind, particularly in single-threaded workloads. That means we can keep
trimming obsolete updates, preventing pages from growing too large.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/d5bddf2b3f3e2acb0b4d5f161de0b522d01fa12e

Comment by David Daly [ 17/Jul/15 ]

I should add that this is a single threaded test workload, updating one document, so I wouldn't expect a need for a lot of sessions or concurrency.

Comment by David Daly [ 17/Jul/15 ]

chung-yen.chang It's a pretty dramatic change on that commit (order of magnitude). On my box it went from ~10.5k ops/sec to ~900 ops/s. Every commit I tested was either > 10k or < 1k.

Comment by Alexander Gorrod [ 17/Jul/15 ]

michael.cahill I'll assign this one to you for now. The commit was a merge with WiredTiger which contains a lot of changes. The graph is interesting - one of the oddest things is that the open session count has dropped from 70 down to near 0. Maybe we are succeeding in forced eviction now, when we were doing in-memory splits before.

Assign it back to me if I should investigate further.

Comment by Chung-yen Chang [ 17/Jul/15 ]

david.daly, how much of the original drop can be attributed to git c8310411? Is that pretty much it or there are other contributors that need to be sorted through?

Comment by David Daly [ 17/Jul/15 ]

agorrod Assigning to you for now since you are on the commit. Please let me know if that is okay.

Comment by David Daly [ 17/Jul/15 ]

Bisect done:

c8310411be00fc1046fcd2472bbb55a5e82cf84e is the first bad commit
commit c8310411be00fc1046fcd2472bbb55a5e82cf84e
Author: Alexander Gorrod <alexander.gorrod@mongodb.com>
Date:   Tue Jun 9 18:47:35 2015 +0000
 
    Import wiredtiger-wiredtiger-2.6.1-59-g1d2fe8a.tar.gz from wiredtiger branch mongodb-3.2
 
:040000 040000 73f116d665a56e98bb3a74871fffdf616f815823 e2d713f60138e05b75a78250e7d8933131db6691 M      src

Comment by David Daly [ 16/Jul/15 ]

I will go ahead and do a bisect on this to see if it can be narrowed down to a commit.

Generated at Thu Feb 08 03:50:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.