Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19954

Extended stall during checkpoint with large number of dirty tables under WiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0-rc3
    • Affects Version/s: 3.0.5
    • Component/s: WiredTiger
    • Labels:
    • Fully Compatible
    • ALL

      Test that creates a 50k dirty tables causes a stall during the checkpoint for several minutes.

      shell script that calls functions in test.js:

      # create a bunch of dirty tables and keep them dirty
      # first iteration through creates the tables
      # subsequent iterations keep them dirty
      # issue does not repro with a smaller number of iterations 
      # maybe tables need to become dirty after checkpoint starts?
      for i in $(seq 20); do
              # insert on 20 threads in parallel into a total of 50k collections
              for t in $(seq 20); do
                  mongo --eval "load('test.js'); insert($t, 2500)" &
      # now try to insert some data to make stall evident
          for t in $(seq 20); do
              mongo --eval "load('test.js'); load()" &


      function insert(t, n) {
          for (var i=0; i<n; i++) {
              c = db['c'+t+'.'+i]
              if (i%100==0)
      function load() {
          count = 1000000
          every = 10000
          for (var i=0; i<count; ) {
              var bulk = db.c.initializeUnorderedBulkOp();
              for (var j=0; j<every; j++, i++)

      • From A to B is first iteration through the loop; this goes slower than subsequent iterations because it is creating 50k collections.
      • From B to C is remaining 19 iterations of inserting one document into each of the 50k collections
      • At C we begin trying to insert a bunch of data, but it stalls for several minutes until D.
      • At D checkpoint ends and inserts become unblocked
      • Based on "cursor prev" calls I think the single-document inserts into the 50k collections from B to C are stalling during checkpoints; I suspect the stalls are not as long because in this phase there are fewer dirty tables at each checkpoint because the tables are only being dirtied as fast as they can be created, whereas from B to C the tables are already created so are being dirtied much faster, so the checkpoint where the lengthy stall occurs sees a lot more dirty tables.

      This is the active thread during the stall:

      Thread 31 (Thread 0x7f5949f32700 (LWP 5161)):
      #0  __strcmp_ssse3 () at ../sysdeps/x86_64/multiarch/../strcmp.S:210
      #1  0x000000000134759a in __wt_meta_track_find_handle (session=session@entry=0x279d8c0, name=0xa8b9c30 "file:collection-6000-3939942762316306140.wt",
          checkpoint=checkpoint@entry=0x5e630aa0 "WiredTigerCheckpoint.1") at src/third_party/wiredtiger/src/meta/meta_track.c:245
      #2  0x000000000137994d in __wt_session_lock_checkpoint (session=session@entry=0x279d8c0, checkpoint=0x5e630aa0 "WiredTigerCheckpoint.1") at src/third_party/wiredtiger/src/session/session_dhandle.c:454
      #3  0x0000000001382885 in __checkpoint_worker (is_checkpoint=1, cfg=<optimized out>, session=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:924
      #4  __wt_checkpoint (session=<optimized out>, cfg=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:1127
      #5  0x0000000001381bf7 in __checkpoint_apply (session=0x279d8c0, cfg=0x7f5949f31a30, op=0x13823c0 <__wt_checkpoint>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:184
      #6  0x0000000001383785 in __wt_txn_checkpoint (session=session@entry=0x279d8c0, cfg=cfg@entry=0x7f5949f31a30) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:501
      #7  0x0000000001376966 in __session_checkpoint (wt_session=0x279d8c0, config=<optimized out>) at src/third_party/wiredtiger/src/session/session_api.c:919
      #8  0x000000000130e38a in __ckpt_server (arg=0x279d8c0) at src/third_party/wiredtiger/src/conn/conn_ckpt.c:95
      #9  0x00007f594e1fb182 in start_thread (arg=0x7f5949f32700) at pthread_create.c:312
      #10 0x00007f594d2fc47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

            michael.cahill@mongodb.com Michael Cahill (Inactive)
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            0 Vote for this issue
            10 Start watching this issue