[SERVER-18316] Database with WT engine fails to recover after system crash Created: 04/May/15  Updated: 23/Sep/22  Resolved: 26/May/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.1.2
Fix Version/s: 3.0.4, 3.1.4

Type: Bug Priority: Critical - P2
Reporter: Jonathan Abrahams Assignee: Susan LoVerso
Resolution: Done Votes: 0
Labels: 32powercycle, 32qa, FT
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-18316.diff     File crashserver.sh     File crashserver.sh.gz     Text File mongod-wiredTiger-recovery.log     Text File mongod-wiredTiger-recovery.log     File mongod-wiredTiger-recovery.log.gz     File mongod-wiredTiger-recovery.log.gz     Text File mongod-wiredTiger.log     Text File mongod-wiredTiger.log     File mongod-wiredTiger.log.gz     File mongod-wiredTiger.log.gz     File powertest.sh     File powertest2.sh     File startmongo.sh     File startmongo.sh.gz     File wiredTiger-afterrecovery.tar.gz     File wiredTiger-afterrecovery.tar.gz     File wiredTiger-afterrecovery.tar.gz     File wiredTiger-beforerecovery.tar.gz     File wiredTiger-beforerecovery.tar.gz     File wiredTiger-beforerecovery.tar.gz     File wiredTiger-beforerecovery.tar.gz.split1     File wiredTiger-beforerecovery.tar.gz.split2     File wiredTiger-failedrecovery.tar.gz     File wiredTiger-firstrun.tar.gz     File wiredTiger-firstrun.tar.gz     File wiredTiger-firstrun.tar.gz     File wiredTiger-firstrun.tar.gz     File wiredTiger.tar.gz     File wiredTiger.tar.gz     File wiredTiger.tar.gz    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-18275 Mongod recovery fails after power cyc... Closed
is duplicated by SERVER-18337 DB fails to recover when specifying m... Closed
is duplicated by SERVER-18448 mongod start failed Closed
is duplicated by SERVER-18628 Can't start, can't repair after force... Closed
is duplicated by SERVER-18850 Cannot start or --repair after WiredT... Closed
is duplicated by SERVER-19225 Wiredtiger corrupt after shutdown Closed
is duplicated by SERVER-19258 Unexpected wiredTiger assertion error... Closed
is duplicated by SERVER-21986 Failed to repair mongo after system c... Closed
Related
related to SERVER-18838 DB fails to recover creates and drops... Closed
is related to SERVER-69892 WiredTiger error message (22) cannot ... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:
  1. Start mongod with specified options on secret port (recovery mode)
    --storageEngine wiredTiger --wiredTigerJournalCompressor none --wiredTigerCollectionBlockCompressor none --wiredTigerIndexPrefixCompression 0
  2. If startup is successful, kill and restart on start port
  3. Run mongo shell to randomly perform CRUD operations
  4. Crash host system at a random period (40-50 seconds after step 2)
  5. After reboot start from step 1
Participants:

 Description   
Issue Status as of Jun 09, 2015

ISSUE SUMMARY
If a system running MongoDB with the WiredTiger storage engine crashes or experiences an unclean shutdown, MongoDB may not be able to recover data files on restart if the crash/shutdown interrupted a WiredTiger checkpoint.

USER IMPACT
MongoDB cannot automatically recover data files on restart.

WORKAROUNDS
None. Restore data from backups or resync from another replica set member.

AFFECTED VERSIONS
MongoDB 3.0 versions before 3.0.4.

FIX VERSION
The fix is included in the 3.0.4 production release.

Original description

DB recovery failed for WT when the following options were specified:
--wiredTigerJournalCompressor none --wiredTigerCollectionBlockCompressor none --wiredTigerIndexPrefixCompression 0

2015-05-04T16:28:54.660-0400 W -        [initandlisten] Detected unclean shutdown - /data/wiredTiger/mongod.lock is not empty.
2015-05-04T16:28:54.660-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-05-04T16:28:54.661-0400 I STORAGE  [initandlisten] wiredtiger_open config:create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log(enabled=true,archive=true,path=journal,compressor=none),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-05-04T16:29:31.728-0400 E STORAGE  [initandlisten] WiredTiger (-31802) [1430771371:728175][1371:0x7f39f5c34b80], file:collection-0--5195437846436728971.wt, WT_SESSION.open_cursor: collection-0--5195437846436728971.wt read error: failed to read 4096 bytes at offset 40960: WT_ERROR: non-specific WiredTiger error
2015-05-04T16:29:31.728-0400 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError-31802:WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 80
2015-05-04T16:29:31.778-0400 I CONTROL  [initandlisten] 0xf4ba16 0xeee0f8 0xed1ee5 0xd6a66d 0xd65a5b 0xd5f824 0xd5ff13 0xd60216 0xd5a404 0xcdbf86 0xcdffa9 0xd5940c 0xcaa860 0x8291ce 0x7ed989 0x7f39f4205ec5 0x8272f7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B4BA16"},{"b":"400000","o":"AEE0F8"},{"b":"400000","o":"AD1EE5"},{"b":"400000","o":"96A66D"},{"b":"400000","o":"965A5B"},{"b":"400000","o":"95F824"},{"b":"400000","o":"95FF13"},{"b":"400000","o":"960216"},{"b":"400000","o":"95A404"},{"b":"400000","o":"8DBF86"},{"b":"400000","o":"8DFFA9"},{"b":"400000","o":"95940C"},{"b":"400000","o":"8AA860"},{"b":"400000","o":"4291CE"},{"b":"400000","o":"3ED989"},{"b":"7F39F41E4000","o":"21EC5"},{"b":"400000","o":"4272F7"}],"processInfo":{ "mongodbVersion" : "3.1.2", "gitVersion" : "aa0066050f0a9db81aa47181d0fbd18c109ae991", "uname" : { "
sysname" : "Linux", "release" : "3.16.0-31-generic", "version" : "#41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "5A1AB53E82C3717157B58AE3409311D6B24478D5" }, { "b" : "7FFFFE5DB000", "elfType" : 3, "buildId" : "5552B9335DDE9349419BA10896C1E75C9432A946" }, { "b" : "7F39F57CD000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F39F53F3000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "379F80D2768BA6A21F52781895EE9F47B34A0A85" }, { "b" : "7F39F51EB000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F39F4FE7000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F39F4CE3000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F39F49DD000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F39F47C7000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F39F45A9000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F39F41E4000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F39F5A2B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xf4ba16]
 mongod(_ZN5mongo10logContextEPKc+0x138) [0xeee0f8]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xA5) [0xed1ee5]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0x1FD) [0xd6a66d]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x4B) [0xd65a5b]
 mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionEb+0x64) [0xd5f824]
 mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionE+0x43) [0xd5ff13]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_bllPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x2B6) [0xd60216]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0xC4) [0xd5a404]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x1D6) [0xcdbf86]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x579) [0xcdffa9]
 mongod(+0x95940C) [0xd5940c]
 mongod(_ZN5mongo20ServiceContextMongoD22setGlobalStorageEngineERKSs+0x350) [0xcaa860]
 mongod(_ZN5mongo13initAndListenEi+0x3EE) [0x8291ce]
 mongod(main+0x139) [0x7ed989]
 libc.so.6(__libc_start_main+0xF5) [0x7f39f4205ec5]
 mongod(+0x4272F7) [0x8272f7]
-----  END BACKTRACE  -----
2015-05-04T16:29:31.778-0400 I -        [initandlisten]
 
***aborting after invariant() failure



 Comments   
Comment by Githook User [ 14/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: If we're closing a dirty tree in order to verify it (or any other exclusive operation), make sure the changes are stable on disk in case there is a crash.

refs SERVER-18316
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/9bf96e1fcb16835c2feb51200bee43e5482c7d5b

Comment by Githook User [ 13/May/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: Fix a bug in checkpoint where we committed the transaction too early.

It's not safe to commit the transaction before the files have been
synced to disk - if we do that the checkpoint can be rolled forward
in recovery before the checkpoint in the file has been updated.

Refs SERVER-18316
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5ebdc72ca013ead51d7eb8e27eb8cb2d8090daf0

Comment by Githook User [ 12/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #1957 from wiredtiger/checkpoint-sync

Sync the metadata file before updating turtle file. SERVER-18316
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/df5a23d353d141afcbdb6cdced0538c87e99d389

Comment by Githook User [ 12/May/15 ]

Author:

{u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

Message: Sync the metadata file before updating turtle file. SERVER-18316
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/1880ed81fe765aed24f7e328388ec739dd1ee3e3

Comment by Jonathan Abrahams [ 07/May/15 ]

Attaching another set of logs & DBs from crash:

  • mongod-wiredTiger-recovery.log
  • mongod-wiredTiger.log
  • wiredTiger-afterrecovery.tar.gz
  • wiredTiger-beforerecovery.tar.gz.split1
  • wiredTiger-beforerecovery.tar.gz.split2
  • wiredTiger-firstrun.tar.gz
  • wiredTiger.tar.gz

cat wiredTiger-beforerecovery.tar.gz.split1 wiredTiger-beforerecovery.tar.gz.split2 > wiredTiger-beforerecovery.tar.gz

Comment by Jonathan Abrahams [ 05/May/15 ]

Tested in the following configurations:

  • --wiredTigerIndexPrefixCompression 0, there was no crash
  • --wiredTigerJournalCompressor none, there was a crash
  • --wiredTigerCollectionBlockCompressor none, there was no crash
Generated at Thu Feb 08 03:47:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.