[SERVER-17204] Crash before completion of first checkpoint after table create can cause irrecoverable db corruption under WiredTiger Created: 06/Feb/15  Updated: 20/May/20  Resolved: 12/Feb/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0-rc8
Fix Version/s: 3.0.0-rc9, 3.1.0

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Alexander Gorrod
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-17451 WiredTiger unable to start if crash l... Closed
is related to SERVER-17152 WiredTiger file corrupted during powe... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:
Linked BF Score: 0

 Description   

If a crash occurs after a table is created but before it has been checkpointed, irrecoverable data loss may occur: after reboot the file may be found to be 4 KB in length but may contain invalid data (for example, all 0s):

$ hexdump collection-2-1933547346719198530.wt
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000

However the log may contain references to that file, causing the following irrecoverable error during recovery:

2015-02-06T13:53:10.673-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-02-06T13:53:10.673-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=5G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-02-06T13:53:12.063-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1423248792:63060][1543:0x7f317f08bbc0], file:collection-2-1933547346719198530.wt: collection-2-1933547346719198530.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2015-02-06T13:53:12.063-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1423248792:63221][1543:0x7f317f08bbc0], file:collection-2-1933547346719198530.wt: Operation failed during recovery: WT_ERROR: non-specific WiredTiger error
2015-02-06T13:53:12.077-0500 I -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error
2015-02-06T13:53:12.080-0500 I STORAGE  [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating

strace shows the apparent reason why: when creating the file we write the 4 KB block but do not fsync it:

1560  open("db/collection-2-2119903794654001319.wt", O_RDWR|O_CREAT|O_EXCL|O_NOATIME, 0666) = 21
1560  pwrite(21, "A\330\1\0\1\0\0\0\330\10#\267\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
1560  close(21)                         = 0

This can result in the problematic state if a crash occurs after log entries for that file are written but before the file data is flushed to disk.

I believe that fdatasync'ing the file after this first write and before any journal entries referencing the file are written should fix this issue.


Generated at Thu Feb 08 03:43:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.