Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19445

Performance regression in Mongo-Perf mms tests with wiredTiger

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.1.5
    • Fix Version/s: 3.1.6
    • Component/s: Performance, WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      In the shell

      var setupMMS = function( collection ) {
          collection.drop();
       
          var base = { _id: 0, a: 0, h: {}, z: 0 };
          for (var i = 0; i < 24; i++) {
              base.h[i] = {};
              for (var j = 0; j < 60; j++) {
                  base.h[i][j] = { n: 0, t: 0, v: 0 };
              }
          }
          collection.insert(base);
      };
      use test0
      setupMMS(db.Update_MmsIncShallow10)
      benchRun({"ops":[{"op":"update","query":{"_id":0},"update":{"$inc":{"a":1}},"ns":"test0.Update_MmsIncShallow10","safe":false,"w":0,"j":false,"writeCmd":true}],"seconds":10,"host":"127.0.0.1:27017","parallel":1})
      

      The update line in the output shows the update throughput.

      Show
      In the shell var setupMMS = function( collection ) { collection.drop();   var base = { _id: 0, a: 0, h: {}, z: 0 }; for (var i = 0; i < 24; i++) { base.h[i] = {}; for (var j = 0; j < 60; j++) { base.h[i][j] = { n: 0, t: 0, v: 0 }; } } collection.insert(base); }; use test0 setupMMS(db.Update_MmsIncShallow10) benchRun({"ops":[{"op":"update","query":{"_id":0},"update":{"$inc":{"a":1}},"ns":"test0.Update_MmsIncShallow10","safe":false,"w":0,"j":false,"writeCmd":true}],"seconds":10,"host":"127.0.0.1:27017","parallel":1}) The update line in the output shows the update throughput.

      Description

      Running mongo-perf test Update.MmsIncShallow1 and the other mms tests shows a large regression (~90%) from master to 3.0.4. On evergreen, 3.1.5 gets 1100 ops/s and 3.0.4 gets 11,434.

      The following data captures both runs. The first run (A to B) is with 3.0.4. The second run (C to D) is with 3.1.5.

      In particular, there appears to be a lot of eviction traffic in the second case, but not the first. The test is 5 seconds long and the server was started immediately before the test.

        Issue Links

          Activity

          Hide
          david.daly David Daly added a comment -

          I will go ahead and do a bisect on this to see if it can be narrowed down to a commit.

          Show
          david.daly David Daly added a comment - I will go ahead and do a bisect on this to see if it can be narrowed down to a commit.
          Hide
          david.daly David Daly added a comment -

          Bisect done:

          c8310411be00fc1046fcd2472bbb55a5e82cf84e is the first bad commit
          commit c8310411be00fc1046fcd2472bbb55a5e82cf84e
          Author: Alexander Gorrod <alexander.gorrod@mongodb.com>
          Date:   Tue Jun 9 18:47:35 2015 +0000
           
              Import wiredtiger-wiredtiger-2.6.1-59-g1d2fe8a.tar.gz from wiredtiger branch mongodb-3.2
           
          :040000 040000 73f116d665a56e98bb3a74871fffdf616f815823 e2d713f60138e05b75a78250e7d8933131db6691 M      src
          

          Show
          david.daly David Daly added a comment - Bisect done: c8310411be00fc1046fcd2472bbb55a5e82cf84e is the first bad commit commit c8310411be00fc1046fcd2472bbb55a5e82cf84e Author: Alexander Gorrod <alexander.gorrod@mongodb.com> Date: Tue Jun 9 18:47:35 2015 +0000   Import wiredtiger-wiredtiger-2.6.1-59-g1d2fe8a.tar.gz from wiredtiger branch mongodb-3.2   :040000 040000 73f116d665a56e98bb3a74871fffdf616f815823 e2d713f60138e05b75a78250e7d8933131db6691 M src
          Hide
          david.daly David Daly added a comment -

          Alex Gorrod Assigning to you for now since you are on the commit. Please let me know if that is okay.

          Show
          david.daly David Daly added a comment - Alex Gorrod Assigning to you for now since you are on the commit. Please let me know if that is okay.
          Hide
          chung-yen.chang Chung-yen Chang added a comment -

          David Daly, how much of the original drop can be attributed to git c8310411? Is that pretty much it or there are other contributors that need to be sorted through?

          Show
          chung-yen.chang Chung-yen Chang added a comment - David Daly , how much of the original drop can be attributed to git c8310411? Is that pretty much it or there are other contributors that need to be sorted through?
          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          Michael Cahill I'll assign this one to you for now. The commit was a merge with WiredTiger which contains a lot of changes. The graph is interesting - one of the oddest things is that the open session count has dropped from 70 down to near 0. Maybe we are succeeding in forced eviction now, when we were doing in-memory splits before.

          Assign it back to me if I should investigate further.

          Show
          alexander.gorrod Alexander Gorrod added a comment - Michael Cahill I'll assign this one to you for now. The commit was a merge with WiredTiger which contains a lot of changes. The graph is interesting - one of the oddest things is that the open session count has dropped from 70 down to near 0. Maybe we are succeeding in forced eviction now, when we were doing in-memory splits before. Assign it back to me if I should investigate further.
          Hide
          david.daly David Daly added a comment -

          Chung-yen Chang It's a pretty dramatic change on that commit (order of magnitude). On my box it went from ~10.5k ops/sec to ~900 ops/s. Every commit I tested was either > 10k or < 1k.

          Show
          david.daly David Daly added a comment - Chung-yen Chang It's a pretty dramatic change on that commit (order of magnitude). On my box it went from ~10.5k ops/sec to ~900 ops/s. Every commit I tested was either > 10k or < 1k.
          Hide
          david.daly David Daly added a comment -

          I should add that this is a single threaded test workload, updating one document, so I wouldn't expect a need for a lot of sessions or concurrency.

          Show
          david.daly David Daly added a comment - I should add that this is a single threaded test workload, updating one document, so I wouldn't expect a need for a lot of sessions or concurrency.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: SERVER-19445 Have the oldest transaction update the oldest tracked ID.

          This will help keep the oldest tracked transation ID from falling too far
          behind, particularly in single-threaded workloads. That means we can keep
          trimming obsolete updates, preventing pages from growing too large.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/d5bddf2b3f3e2acb0b4d5f161de0b522d01fa12e

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: SERVER-19445 Have the oldest transaction update the oldest tracked ID. This will help keep the oldest tracked transation ID from falling too far behind, particularly in single-threaded workloads. That means we can keep trimming obsolete updates, preventing pages from growing too large. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/d5bddf2b3f3e2acb0b4d5f161de0b522d01fa12e
          Hide
          michael.cahill Michael Cahill added a comment -

          For posterity, what was going on in this workload was single-threaded updates to a single document. The document was moderately large, so if WiredTiger accumulated enough versions of the document, it would hit the 10MB limit on the size of a page in memory and forcibly evict the page.

          There was a change in the drop for 3.1.5 that made WiredTiger lazier about updating the oldest transaction ID in the system. This value determines how many version of the document we think need to stay in cache. We used to update that value actively every time a transaction started, but some workloads go faster if regular operations don't update that shared state.

          The unfortunate consequence of that change was to trigger eviction in this workload – the oldest ID was lagging far enough behind that we kept around enough versions of the document to reach the 10MB limit. The solution committed into WiredTiger develop is to bump that oldest transaction ID when a transaction commits, if it thought it was the oldest one running. This isn't exact, but particularly for single-threaded workloads it should have the effect of having the oldest ID track the current ID much more closely, which means we throw old versions of the document away much more quickly and never cause unnecessary eviction.

          Show
          michael.cahill Michael Cahill added a comment - For posterity, what was going on in this workload was single-threaded updates to a single document. The document was moderately large, so if WiredTiger accumulated enough versions of the document, it would hit the 10MB limit on the size of a page in memory and forcibly evict the page. There was a change in the drop for 3.1.5 that made WiredTiger lazier about updating the oldest transaction ID in the system. This value determines how many version of the document we think need to stay in cache. We used to update that value actively every time a transaction started, but some workloads go faster if regular operations don't update that shared state. The unfortunate consequence of that change was to trigger eviction in this workload – the oldest ID was lagging far enough behind that we kept around enough versions of the document to reach the 10MB limit. The solution committed into WiredTiger develop is to bump that oldest transaction ID when a transaction commits, if it thought it was the oldest one running. This isn't exact, but particularly for single-threaded workloads it should have the effect of having the oldest ID track the current ID much more closely, which means we throw old versions of the document away much more quickly and never cause unnecessary eviction.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: