Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3137

Hang in __log_slot_join/__log_slot_switch_internal

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.2, 3.2.13, 3.4.3, 3.5.4
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage 2017-01-23, Storage 2017-02-13

      • 24 cpus, 64 GB memory, 1 TB hard drive
      • ubuntu 15.10
      • mongod 3.4.0
      • standalone with: --wiredTigerCacheSizeGB 8 --wiredTigerCollectionBlockCompressor=zlib

      Run 20 threads of attached repro script with:

          for t in $(seq 20); do
              mongo --quiet test --eval "load('repro.js'); insert()" &
          done
          wait
      

      Script inserts 139 M generic text documents ranging from 1 kB to 4 MB according to a probability distribution encoded in the script, with smaller documents being much more common. Average document size is about 60 kB. Documents typically compress about 12:1 with zlib compression in the db.

      In three runs script

      • hung at ~35%
      • reached 50% without hanging when I terminated it
      • hung at about 50%

      When it hung mongod was consuming 2100% CPU, a majority of that system CPU. Per gdb stacks (attached) I think this is accounted for by 20 threads spinning in __log_slot_join and one thread spinning in __log_slot_switch_internal.

        1. log+gdb.tgz
          30 kB
        2. log+gdb-2.tgz
          39 kB
        3. r0.log
          205 kB
        4. repro.js
          82 kB
        5. repro-01-12-mongod.log
          154 kB
        6. repro-01-13-mongod.log
          155 kB
        7. stacks1.txt
          70 kB
        8. stacks2.txt
          70 kB
        9. stacks3.txt
          70 kB

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: