Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19522

Capped collection insert rate declines over time under WiredTiger

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.6, 3.1.7
    • Component/s: WiredTiger
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Epic Link:

      Description

      Capped collection insert rate begins to decline steadily once it is full and deletions start. Restarting mongod (without disturbing the collection) restores inserts to the original rate, but insert rate then begins to decline to a fraction of the initial rate:

      20486 mb, 72577 ins/s
      20489 mb, 63780 ins/s
      20494 mb, 61213 ins/s
      20495 mb, 55644 ins/s
      20491 mb, 50556 ins/s
      20494 mb, 46906 ins/s
      20493 mb, 39926 ins/s
      20496 mb, 36380 ins/s
      20496 mb, 34873 ins/s
      20496 mb, 43359 ins/s
      20492 mb, 36704 ins/s
      20491 mb, 25022 ins/s
      20493 mb, 26334 ins/s
      20494 mb, 27420 ins/s
      20496 mb, 26594 ins/s
      20493 mb, 25970 ins/s
      20496 mb, 27202 ins/s
      20496 mb, 25868 ins/s
      20492 mb, 26571 ins/s
      20491 mb, 26460 ins/s
      20496 mb, 26840 ins/s
      20496 mb, 26273 ins/s
      20491 mb, 25883 ins/s
      20496 mb, 25342 ins/s
      20491 mb, 25477 ins/s
      20496 mb, 24753 ins/s
      20496 mb, 24196 ins/s
      20492 mb, 24092 ins/s
      20491 mb, 23656 ins/s
      20496 mb, 23656 ins/s
      20495 mb, 23213 ins/s
      20496 mb, 22776 ins/s
      20496 mb, 23280 ins/s
      20495 mb, 22710 ins/s
      20496 mb, 22352 ins/s
      20492 mb, 22554 ins/s
      

      Seems to be worse with larger documents and with larger collections:

      doc size    cap      initial    final   final/
                           rate       rate    initial
      200         2GB      194        160     82%
      1000        10GB     140        84      60%
      1000        20GB     140        63      45%
      2000        20GB     115        50      43%
      4000        20GB     83         23      27%
      

      Stack trace excepts showing involvement of WT delete (complete stack traces attached as capped-delete.html):

      1. capped-delete.html
        4.98 MB
        Bruce Lucas
      1. 19522.png
        70 kB
      2. capped-delete.png
        399 kB
      3. patch50.png
        9 kB

        Issue Links

          Activity

          Hide
          martin.bligh Martin Bligh (Inactive) added a comment -

          Only reason I can see to do it in WT is if you can think of some way to make lookups faster than having to do N lookups when we partition into N pieces.
          Or if you keep child record cumulative counters / sum size?

          Show
          martin.bligh Martin Bligh (Inactive) added a comment - Only reason I can see to do it in WT is if you can think of some way to make lookups faster than having to do N lookups when we partition into N pieces. Or if you keep child record cumulative counters / sum size?
          Hide
          bruce.lucas Bruce Lucas added a comment -

          My test run was standalone; everything else is as described above (12 core / 24 cpu machine, 64 GB mem, 24 threads of the workload). Any differences in throughput from my tests presumably are due to some difference in configuration or test parameters.

          In particular my test was entirely CPU bound: the 20 GB capped collection fits in the default 32 GB cache. You can also see this from the stack traces I attached: when deletions begin, 24 threads spend much of their time in pthread_mutex_timedlock called from cappedDeleteAsNeeded, waiting for a single thread to do the deletions, and that single thread is clearly CPU-bound - spending almost all its time in __wt_tree_walk, and much of that time in __wt_evict, with almost no i/o among those stack traces. So it seems if it were possible to eliminate the evictions that would provide improvement.

          I also wonder if it might be possible to improve things by allowing more than one thread to do the deletions - for example, n threads each deleting every nth record of a range, to provide a better match for the n threads doing insertion. TBD I guess how much contention would limit the speedup.

          Show
          bruce.lucas Bruce Lucas added a comment - My test run was standalone; everything else is as described above (12 core / 24 cpu machine, 64 GB mem, 24 threads of the workload). Any differences in throughput from my tests presumably are due to some difference in configuration or test parameters. In particular my test was entirely CPU bound: the 20 GB capped collection fits in the default 32 GB cache. You can also see this from the stack traces I attached: when deletions begin, 24 threads spend much of their time in pthread_mutex_timedlock called from cappedDeleteAsNeeded, waiting for a single thread to do the deletions, and that single thread is clearly CPU-bound - spending almost all its time in __wt_tree_walk, and much of that time in __wt_evict, with almost no i/o among those stack traces. So it seems if it were possible to eliminate the evictions that would provide improvement. I also wonder if it might be possible to improve things by allowing more than one thread to do the deletions - for example, n threads each deleting every nth record of a range, to provide a better match for the n threads doing insertion. TBD I guess how much contention would limit the speedup.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2091 from wiredtiger/intl-page-empty

          SERVER-19522 Try to evict internal pages with no useful child pages.
          (cherry picked from commit fb8739f423a824b76e1a26cbaa28c1edaa2f2f7d)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/000ff74c23e153a3802cbf04523e72a08f00e3e3

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2091 from wiredtiger/intl-page-empty SERVER-19522 Try to evict internal pages with no useful child pages. (cherry picked from commit fb8739f423a824b76e1a26cbaa28c1edaa2f2f7d) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/000ff74c23e153a3802cbf04523e72a08f00e3e3
          Hide
          michael.cahill Michael Cahill added a comment -

          [~bruce.lucas@10gen.com], I am inclined to resolve the original issue reported here as fixed in 3.0.6 and 3.1.7. That is, ignoring the drop-off when the capped collection becomes full, the performance over time is now stable.

          I'm happy to investigate the impact of the capped deletes further (though I don't have any good ideas at the moment), but I think that should have a separate ticket. Does that seem reasonable to you?

          Show
          michael.cahill Michael Cahill added a comment - [~bruce.lucas@10gen.com] , I am inclined to resolve the original issue reported here as fixed in 3.0.6 and 3.1.7. That is, ignoring the drop-off when the capped collection becomes full, the performance over time is now stable. I'm happy to investigate the impact of the capped deletes further (though I don't have any good ideas at the moment), but I think that should have a separate ticket. Does that seem reasonable to you?
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          The capped collection insert rate decline over time has been fixed so I'm resolving this issue.

          I've created SERVER-19995 to investigate the performance drop of capped deletes when the collection is full

          Show
          ramon.fernandez Ramon Fernandez added a comment - The capped collection insert rate decline over time has been fixed so I'm resolving this issue. I've created SERVER-19995 to investigate the performance drop of capped deletes when the collection is full

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: