Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20091

Poor query throughput and erratic behavior at high connection counts under WiredTiger

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.0.5
    • Fix Version/s: 3.0.7, 3.1.8
    • Component/s: Concurrency, WiredTiger
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Sprint:
      QuInt 8 08/28/15

      Description

      • single collection, 100k documents (fits in memory)
      • 12 cpus (6 cores)
      • workload is n connections each querying a random single document in a loop by _id
      • measured raw flat-out maximum capacity by using 150 connections each doing queries as fast as possible; similar numbers for WT (174k queries/s) and mmapv1 (204k queries/s)
      • then measured simulated customer app by introducing delay in loop so that each connection executes 10 queries/s, and then ramped number of connections up to 10k, for an expected throughput of 10 queries/connection/s * 10k connections = 100k queries/s. This is well below (about half) the measured maximum raw capacity for both WT and mmapv1, so expect to be able to achieve close to 100k queries/s at 10k connections
      • do achieve close to that for mmapv1 (75k queries/s), but only get about 25k queries/s for WT at 10k connections, and behavior is erratic

      mmapv1

      • max raw capacity is 204k queries/s (as described above, this is with 150 connections each issuing queries as fast as possible)
      • as connections are rampled up to 10k connections, this time with each connection issuing only 10 queries/s, throughput behavior is excellent up to about 6k connections, some mild falloff above that
      • at 10k connections getting about 75k queries/s (estimated by fitting the blue quadratic trendline), not too far below the expected 100k queries/s

      WiredTiger

      • max raw capacity is similar to mmapv1 at 174k queries/s (as described above, this is with 150 connections each issuing queries as fast as possible)
      • but as connections are rampled up to 10k connections, this time each connection issuing only 10 queries/s, above about 3k connections behavior becomes erratic
      • at 10k connections getting only about 25k queries/s (estimated by fitting the blue quadratic trendline), far below the expected 100k queries/s

      Repro code:

      function repro_setup() {
          x = []
          for (var i=0; i<100; i++)
              x.push(i)
          count = 100000
          every = 10000
          for (var i=0; i<count; ) {
              var bulk = db.c.initializeUnorderedBulkOp();
              for (var j=0; j<every; j++, i++)
                  bulk.insert({})
              bulk.execute();
              print(i)
          }
      }
       
      function conns() {
          return db.serverStatus().connections.current
      }
       
      function repro(threads_query) {
          start_conns = conns()
          while (conns() < start_conns+threads_query) {
              ops_query = [{
                  op: "query",
                  ns: "test.c",
                  query: {_id: {"#RAND_INT": [0, 10000]}},
                  delay: NumberInt(100 + Math.random()*10-5)
              }]
              res = benchStart({
                  ops: ops_query,
                  seconds: seconds,
                  parallel: 10,
              })
              sleep(100)
          }
      }
      

      1. NetworkCounters.patch
        2 kB
        Bruce Lucas
      1. combined.png
        23 kB
      2. mmapv1.png
        19 kB
      3. mutex.png
        18 kB
      4. mutex2.png
        15 kB
      5. network-mmap.png
        19 kB
      6. network-wt.png
        20 kB
      7. slowms.png
        21 kB
      8. variations.png
        23 kB
      9. wt.png
        22 kB

        Issue Links

          Activity

          Hide
          michael.cahill Michael Cahill added a comment -

          Seems like this should be safe to backport?

          Show
          michael.cahill Michael Cahill added a comment - Seems like this should be safe to backport?
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-20091: Use mutex instead of spinlock to protect session cache
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/604c22e9b45f4e08c042fb1857bb34ea20de0fb1

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-20091 : Use mutex instead of spinlock to protect session cache Branch: v3.0 https://github.com/mongodb/mongo/commit/604c22e9b45f4e08c042fb1857bb34ea20de0fb1
          Hide
          bruce.lucas Bruce Lucas added a comment -

          Found another spinlock that seems to be responsible for the remaining scalability issue and erratic behavior seen above, this one relating to network counter stats:

             3325 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:232;mongo::NetworkCounter::hit:152;lock:76;mongo::SpinLock::_lk:95;nanosleep:81
              340 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:232;mongo::NetworkCounter::hit:152;lock:76;mongo::SpinLock::_lk:87;sched_yield:81
              226 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:219;mongo::MessagingPort::recv:183;mongo::Socket::recv:762;mongo::Socket::unsafe_recv:772;mongo::Socket::_recv:784;recv:44;__libc_recv:33
          

          In this single sample about 3k threads are sleeping in the spinlock guarding the network counters, while 340 threads are doing a busy wait calling sched_yield, creating severe CPU contention.

          POC patch attached that eliminates the lock and uses atomic ints for the counters instead. As the following measurements show this, together with the fix for the session cache spinlock, seems to complete fix the scalability issues and essentially eliminates the erratic behavior. The downside of this approach is that there may be skew between the network counters (bytes in, bytes out, number of requests), but I think this is acceptable since the primary use of these counters is to compute rates over periods of typically 1s or more, so precise alignment of the counters at the instruction execution level is not important.

          Remaining question is why this appears to impact WT more than mmap. I suspect that is because this is a CPU contention issue, and WT in general uses more CPU than mmap for the same task.

          Red samples are with the previous fix for the session cache spinlock only; blue samples add NetworkCounters.patch.

          Show
          bruce.lucas Bruce Lucas added a comment - Found another spinlock that seems to be responsible for the remaining scalability issue and erratic behavior seen above, this one relating to network counter stats: 3325 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:232;mongo::NetworkCounter::hit:152;lock:76;mongo::SpinLock::_lk:95;nanosleep:81 340 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:232;mongo::NetworkCounter::hit:152;lock:76;mongo::SpinLock::_lk:87;sched_yield:81 226 0.0;1;;clone:111;start_thread:312;mongo::PortMessageServer::handleIncomingMsg:219;mongo::MessagingPort::recv:183;mongo::Socket::recv:762;mongo::Socket::unsafe_recv:772;mongo::Socket::_recv:784;recv:44;__libc_recv:33 In this single sample about 3k threads are sleeping in the spinlock guarding the network counters, while 340 threads are doing a busy wait calling sched_yield, creating severe CPU contention. POC patch attached that eliminates the lock and uses atomic ints for the counters instead. As the following measurements show this, together with the fix for the session cache spinlock, seems to complete fix the scalability issues and essentially eliminates the erratic behavior. The downside of this approach is that there may be skew between the network counters (bytes in, bytes out, number of requests), but I think this is acceptable since the primary use of these counters is to compute rates over periods of typically 1s or more, so precise alignment of the counters at the instruction execution level is not important. Remaining question is why this appears to impact WT more than mmap. I suspect that is because this is a CPU contention issue, and WT in general uses more CPU than mmap for the same task. Red samples are with the previous fix for the session cache spinlock only; blue samples add NetworkCounters.patch.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-20091: turn network counters into atomics
          Branch: master
          https://github.com/mongodb/mongo/commit/4ac0bfb55d9cccbc6784ded607eae14312ec9bcc

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-20091 : turn network counters into atomics Branch: master https://github.com/mongodb/mongo/commit/4ac0bfb55d9cccbc6784ded607eae14312ec9bcc
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-20091: turn network counters into atomics

          (cherry picked from commit 4ac0bfb55d9cccbc6784ded607eae14312ec9bcc)
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/9221fcee30937ae930464d55ce0e275dae6d1795

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-20091 : turn network counters into atomics (cherry picked from commit 4ac0bfb55d9cccbc6784ded607eae14312ec9bcc) Branch: v3.0 https://github.com/mongodb/mongo/commit/9221fcee30937ae930464d55ce0e275dae6d1795

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile