Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2696

Race condition on unclean shutdown may miss log records with large updates

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • WT2.9.0, 3.2.8, 3.3.9
    • Affects Version/s: WT2.8.0
    • Component/s: None
    • Labels:

      Issue Status as of Jul 06, 2016

      Under extremely rare circumstances, a race condition in the code that updates large records may cause some of those updates to be lost during an unclean shutdown.

      On a production system, the path with the race condition is only taken when log records are 128k or larger. From MongoDB's perspective, it is a smaller record size, maybe 40k, since an individual WT log record contains the insert into collections, indexes, oplog, etc.

      Attempts to trigger this race condtion with MongoDB using a synthetic workload with compression disabled have produced mixed results. However, attempts to reproduce this issue in MongoDB with default compression (snappy) have been unsuccessful.

      This issue only affects users running with journaling enabled. Users that run with journaling disabled can not be affected by this bug.

      If the race condition is triggered and the node suffers an unclean shutdown, some updates to large records since the last checkpoint may be lost. Unfortunately it is not possible to detect if the race condition has been triggered.

      MongoDB 3.2 versions up to and including MongoDB 3.2.7.

      A fix for this issue is included in the MongoDB 3.2.8 production release. Users with workloads that include updates to large records whose nodes may be subject to unclean shutdowns should upgrade to MongoDB 3.2.8 to avoid exposure to this issue.

      Unfortunately there are no known workarounds for this issue.

      Original description


      After re-building WiredTiger with diagnostic enabled one of out test started to fail.
      The test checks ability of DB to recover after application crash.
      Please see attached minimized test:

      $ ./recovery-test-mp
      5 writer threads spawned
      killing child
      checking DB...
      no record with key 28363
      no record with key 3689348814741930043
      no record with key 3689348814741983775
      no record with key 7378697629483839817
      no record with key 7378697629483894622
      no record with key 11068046444225735421
      no record with key 14757395258967669044
      no record with key 14757395258967726182
      8 record(s) absent from total of 544769

      I was unable to reproduce the problem without diagnostic enabled.


        1. check.2696.js
          0.6 kB
        2. insert.2696.js
          0.3 kB
        3. recovery-test-mp.c
          5 kB
        4. run2696.sh
          3 kB
        5. runloop.sh
          0.3 kB

            sue.loverso@mongodb.com Susan LoVerso
            Dmitri Shubin Dmitri Shubin
            0 Vote for this issue
            6 Start watching this issue