Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17204

Crash before completion of first checkpoint after table create can cause irrecoverable db corruption under WiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.0-rc9, 3.1.0
    • Affects Version/s: 3.0.0-rc8
    • Component/s: WiredTiger
    • Labels:
    • Fully Compatible
    • ALL
    • 0

      If a crash occurs after a table is created but before it has been checkpointed, irrecoverable data loss may occur: after reboot the file may be found to be 4 KB in length but may contain invalid data (for example, all 0s):

      $ hexdump collection-2-1933547346719198530.wt
      0000000 0000 0000 0000 0000 0000 0000 0000 0000

      However the log may contain references to that file, causing the following irrecoverable error during recovery:

      2015-02-06T13:53:10.673-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-02-06T13:53:10.673-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=5G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-02-06T13:53:12.063-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1423248792:63060][1543:0x7f317f08bbc0], file:collection-2-1933547346719198530.wt: collection-2-1933547346719198530.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
      2015-02-06T13:53:12.063-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1423248792:63221][1543:0x7f317f08bbc0], file:collection-2-1933547346719198530.wt: Operation failed during recovery: WT_ERROR: non-specific WiredTiger error
      2015-02-06T13:53:12.077-0500 I -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error
      2015-02-06T13:53:12.080-0500 I STORAGE  [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating

      strace shows the apparent reason why: when creating the file we write the 4 KB block but do not fsync it:

      1560  open("db/collection-2-2119903794654001319.wt", O_RDWR|O_CREAT|O_EXCL|O_NOATIME, 0666) = 21
      1560  pwrite(21, "A\330\1\0\1\0\0\0\330\10#\267\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
      1560  close(21)                         = 0

      This can result in the problematic state if a crash occurs after log entries for that file are written but before the file data is flushed to disk.

      I believe that fdatasync'ing the file after this first write and before any journal entries referencing the file are written should fix this issue.

            alexander.gorrod@mongodb.com Alexander Gorrod
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            0 Vote for this issue
            11 Start watching this issue