Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16662

Extended pauses in WiredTiger when workers greatly exceeds number of cores

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: Storage, WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      This is a continuation of SERVER-16269.

      • tests run on Ubuntu, VMWare, 6 cores.
      • THP confirmed disabled (both /sys/kernel/mm/transparent_hugepage/{defrag,enabled} set to "never").
      • using a recent pre-rc4 build from master (6105f06402fe1e7578d41f3e4e583a1476ef2455).

      Under heavy mixed workload with 100 clients, periodic extended pauses of several seconds with zero throughput are seen, for example at A and B:

      • Each pause ends with reports in mongod log of operations taking as long as the pause, up to 11852 ms in this test (you can read this off of the "max logged query" graph: vertical axis is longest operation, in ms, logged in each 0.1 second interval). The application would see this as an anomaly.
      • During each pause system cpu utilization and context switch rate goes very high. This is consistent with multiple threads executing a spin lock calling sched_yield. The high CPU contention due to the spin lock and a high ratio of threads to cores would extend the pause by starving the thread it is waiting for.
      • There is a general roughness, i.e. very short spikes, on the order of the sampling interval of 0.1 seconds, in system CPU and context switches. These seem to correlate with a similar roughness, i.e. very short dips, in the overall throughput. This suggests that even outside the pauses overall throughput is being affected by CPU contention due to the sched_yield spin locks.
      • The particularly long pause at B seems to be associated with the end of a checkpoint.

      The length of the pauses is roughly proportional to the number of threads, supporting the theory that the CPU contention due to the spin lock exacerbates the pauses. Here's the same 200 second run with 50 clients, showing pauses roughly one half the length of pauses with 100 clients above.

      Similar symptoms are seen in a less extreme form with a 100-thread insert-only workload:

      • The periodic dips (A, B, C, D) are briefer in this insert-only workload than in the mixed workload above. This suggests that non-insert operations, probably reads, are particularly affected by the periodic dips.
      • The disruption from E to F, apparently associated with the end of a checkpoint, is as bad for the insert-only workload as it is for the mixed workload: mongod reports operations taking up to 11305 ms. This suggests that this pause is may be a little different as it appears to affect inserts as well.

      Insert-only workload:

      function insert(count) {
          every = 1000
          var t = new Date()
          for (var i=0; i<count; ) {
              var bulk = db.c.initializeUnorderedBulkOp();
              for (var j=0; j<every; j++, i++)
                  bulk.insert({})
              bulk.execute();
          }
      }
      insert(200000)

      1. try15.png
        169 kB
      2. wt-pre4-jp-100.png
        159 kB
      3. wt-pre4-jp-100-nolog.png
        170 kB
      4. wt-pre4-jp-50.png
        158 kB
      5. wt-rc4-jp-50-nojournal.png
        152 kB
      6. wt-rc4-jp-50-nojournal-pause1.png
        454 kB
      7. wt-rc4-jp-50-nojournal-pause2.png
        449 kB

        Issue Links

          Activity

          Hide
          john.page John Page added a comment - - edited

          Just seen/realised this is especially pertinent in a sharded setup as each monogs router will make a whole bunch of connections and thus threads - I'm seeing as I add routers - overall performance drops in 2.8 - However! it also does with mmapv1 and it doesn't in 2.6 so this is more than simply wired tiger related.

          Show
          john.page John Page added a comment - - edited Just seen/realised this is especially pertinent in a sharded setup as each monogs router will make a whole bunch of connections and thus threads - I'm seeing as I add routers - overall performance drops in 2.8 - However! it also does with mmapv1 and it doesn't in 2.6 so this is more than simply wired tiger related.
          Hide
          michael.cahill Michael Cahill added a comment -

          This workload should be considerably improved by RC5.

          Show
          michael.cahill Michael Cahill added a comment - This workload should be considerably improved by RC5.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: