Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-6391

Yielding with one (or more) active writer and heavy read load results in severe performance degradation

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.0.7, 2.2.0-rc1
    • Affects Version/s: 2.0.2
    • Component/s: Concurrency
    • Labels:
    • Environment:
    • Linux

      Tests with the following command result in execution times of 20 seconds to 10 minutes:

      > benchRun({ops:[{ns:'test.bench', op:'find', query:{a:1}}, {ns:'test.bench', op:'update', query:{a:1}, update:{$set:{z:1}}}], db:'test', parallel:4096});
      

      The collection has 1000 identical documents:

      {
      	"_id" : ObjectId("4ffbd85b3e968c0d937006f4"),
      	"a" : 1,
      	"asdfasdf" : 1,
      	"dsfsfda" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      	"z" : 1
      }
      

      With ~3944 concurrent readers and 10-15 concurrent writers:

      • If recommendYieldMicros() lies and says there are 3944 readers and 0 writers, test completes consistently in 23 seconds.
      • If recommendYieldMicros() lies and says there are 1 readers and 0 writers, test completes consistently in 20 seconds.
      • If recommendYieldMicros() lies and says there are 1 readers and 1 writers, test completes in 8 minutes and 14 seconds, stall/burst pattern*.
      • If recommendYieldMicros() tells the truth, test completes in 10 minutes and 54 seconds, stall/burst pattern*.

      *NOTE: stall/burst in this sense means minute-long intervals where only ~300K-500K of data is transmitted, followed by a burst of 1-5Mbps for up to a few seconds. 'Consistently' in this case meaning between 10-50Mbps (avg. ~35Mbps). Daemon and client are on the same VM.

      Having zero writers means yieldSuggest() always suggests not yielding. Under heavy load, this seems to indicate that the cost of actually yielding is tremendously expensive.

      It should be noted that recommendedYieldMicros() is expensive when lots of connections are running, taking an average of 1/3 of each threads execution time with 4k connections. That said, the impact of removing this call is relatively minimal in terms of test completion time:

      With the calls to recommendedYieldMicros() acquiring the clientsMutex lock and hard-coding writers to 0, the aforementioned test completes in 24 seconds, vs. 20 seconds when the lock/count is avoided.
      With the calls to recommendedYieldMicros() acquiring the clientsMutex lock and hard-coding writers to 1, the aforementioned test completes in 11 minutes 10 seconds, vs. 8 minutes 14 seconds when the lock/count is avoided.

            Assignee:
            benjamin.becker Ben Becker
            Reporter:
            benjamin.becker Ben Becker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: