[SERVER-18275] Mongod recovery fails after power cycle for Wired Tiger Created: 30/Apr/15  Updated: 26/May/15  Resolved: 26/May/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.1.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jonathan Abrahams Assignee: Michael Cahill (Inactive)
Resolution: Duplicate Votes: 0
Labels: 32powercycle, 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File CAP-1863.js     File crashserver.sh     File crashserver.sh     File mongo-client.sh     Text File mongod-wiredTiger-recovery.log     Text File mongod-wiredTiger.log     File startmongo.sh     File startmongo.sh     File wiredTiger-afterrecover.tar.gz     File wiredTiger-beforerecovery.tar.gz     File wiredTiger-firstrun.tar.gz     File wiredTiger.tar.gz    
Issue Links:
Duplicate
duplicates SERVER-18316 Database with WT engine fails to reco... Closed
is duplicated by SERVER-18477 mongod 3.0.3 start failing due to cor... Closed
is duplicated by SERVER-18628 Can't start, can't repair after force... Closed
Tested
Operating System: ALL
Steps To Reproduce:

The power cycle test

  1. start mongod with wiredTiger on port 37017 on Server host
  2. If mongod starts successfully, kill monogd and restart on port 27017 on Server host
  3. 3 external clients connect on port 27017 and randomly execute CRUD operations on a collection on Client host
  4. Force an immediate crash and reboot on Server host
  5. Repeat from step 1
Participants:

 Description   

The power cycle test triggered a fatal recovery:

2015-04-29T19:48:01.984-0400 I CONTROL  ***** SERVER RESTARTED *****
2015-04-29T19:48:02.385-0400 W -        [initandlisten] Detected unclean shutdown - /data/wiredTiger/m
ongod.lock is not empty.
2015-04-29T19:48:02.385-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint
.
2015-04-29T19:48:02.427-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-04-29T19:48:21.023-0400 E STORAGE  [initandlisten] WiredTiger (0) [1430351301:23444][1352:0x7fc1b91c4b80], file:collection-0--8632498072019552590.wt: read checksum error [36864B @ 422645760, 1753217059 != 675170273]
2015-04-29T19:48:21.023-0400 E STORAGE  [initandlisten] WiredTiger (0) [1430351301:23498][1352:0x7fc1b91c4b80], file:collection-0--8632498072019552590.wt: collection-0--8632498072019552590.wt: encountered an illegal file format or internal value
2015-04-29T19:48:21.023-0400 E STORAGE  [initandlisten] WiredTiger (-31804) [1430351301:23507][1352:0x7fc1b91c4b80], file:collection-0--8632498072019552590.wt: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-04-29T19:48:21.023-0400 I -        [initandlisten] Fatal Assertion 28558
2015-04-29T19:48:21.168-0400 I CONTROL  [initandlisten]
 0xf4ba16 0xeee0f8 0xed0f8d 0xd6d4f4 0x134dbf9 0x134dda5 0x134e1b4 0x12aa16e 0x12aa608 0x12a8213 0x12ab336 0x12c3731 0x12ea313 0x134d133 0x134d25d 0x12f6191 0x134b4da 0x1356aef 0x1356d72 0x130d60c 0x1357846 0x12ed446 0x12e7cf4 0xd5b4f4 0xd5938b 0xcaa860 0x8291ce 0x7ed989 0x7fc1b7795ec5 0x8272f7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B4BA16"},{"b":"400000","o":"AEE0F8"},{"b":"400000","o":"AD0F8D"},{"b":"400000","o":"96D4F4"},{"b":"400000","o":"F4DBF9"},{"b":"400000","o":"F4DDA5"},{"b":"400000","o":"F4E1B4"},{"b":"400000","o":"EAA16E"},{"b":"400000","o":"EAA608"},{"b":"400000","o":"EA8213"},{"b":"400000","o":"EAB336"},{"b":"400000","o":"EC3731"},{"b":"400000","o":"EEA313"},{"b":"400000","o":"F4D133"},{"b":"400000","o":"F4D25D"},{"b":"400000","o":"EF6191"},{"b":"400000","o":"F4B4DA"},{"b":"400000","o":"F56AEF"},{"b":"400000","o":"F56D72"},{"b":"400000","o":"F0D60C"},{"b":"400000","o":"F57846"},{"b":"400000","o":"EED446"},{"b":"400000","o":"EE7CF4"},{"b":"400000","o":"95B4F4"},{"b":"400000","o":"95938B"},{"b":"400000","o":"8AA860"},{"b":"400000","o":"4291CE"},{"b":"400000","o":"3ED989"},{"b":"7FC1B7774000","o":"21EC5"},{"b":"400000","o":"4272F7"}],"processInfo":{ "mongodbVersion" : "3.1.2", "gitVersion" : "aa0066050f0a9db81aa47181d0fbd18c109ae991", "uname" : { "sysname" : "Linux", "release" : "3.16.0-31-generic", "version" : "#41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "5A1AB53E82C3717157B58AE3409311D6B24478D5" }, { "b" : "7FFF681EA000", "elfType" : 3, "buildId" : "5552B9335DDE9349419BA10896C1E75C9432A946" }, { "b" : "7FC1B8D5D000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7FC1B8983000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "379F80D2768BA6A21F52781895EE9F47B34A0A85" }, { "b" : "7FC1B877B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FC1B8577000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FC1B8273000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7FC1B7F6D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FC1B7D57000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FC1B7B39000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FC1B7774000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FC1B8FBB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xf4ba16]
 mongod(_ZN5mongo10logContextEPKc+0x138) [0xeee0f8]
 mongod(_ZN5mongo13fassertFailedEi+0x4D) [0xed0f8d]
 mongod(+0x96D4F4) [0xd6d4f4]
 mongod(__wt_eventv+0x3E9) [0x134dbf9]
 mongod(__wt_err+0x95) [0x134dda5]
 mongod(__wt_panic+0x24) [0x134e1b4]
 mongod(__wt_block_extlist_read+0x6E) [0x12aa16e]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12aa608]
 mongod(__wt_block_checkpoint_load+0x193) [0x12a8213]
 mongod(+0xEAB336) [0x12ab336]
 mongod(__wt_btree_open+0xAB1) [0x12c3731]
 mongod(__wt_conn_btree_get+0x243) [0x12ea313]
 mongod(__wt_session_get_btree+0x323) [0x134d133]
 mongod(__wt_session_get_btree_ckpt+0xBD) [0x134d25d]
 mongod(__wt_curfile_open+0xE1) [0x12f6191]
 mongod(__wt_open_cursor+0x26A) [0x134b4da]
 mongod(+0xF56AEF) [0x1356aef]
 mongod(+0xF56D72) [0x1356d72]
 mongod(__wt_log_scan+0x8BC) [0x130d60c]
 mongod(__wt_txn_recover+0x346) [0x1357846]
 mongod(__wt_connection_workers+0x46) [0x12ed446]
 mongod(wiredtiger_open+0x1124) [0x12e7cf4]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x2E4) [0xd5b4f4]
 mongod(+0x95938B) [0xd5938b]
 mongod(_ZN5mongo20ServiceContextMongoD22setGlobalStorageEngineERKSs+0x350) [0xcaa860]
 mongod(_ZN5mongo13initAndListenEi+0x3EE) [0x8291ce]
 mongod(main+0x139) [0x7ed989]
 libc.so.6(__libc_start_main+0xF5) [0x7fc1b7795ec5]
 mongod(+0x4272F7) [0x8272f7]
-----  END BACKTRACE  -----
2015-04-29T19:48:21.168-0400 I -        [initandlisten]
 
***aborting after fassert() failure



 Comments   
Comment by Jonathan Abrahams [ 08/May/15 ]

Please note, updated startmongo.sh & crashserver.sh

cat wiredTiger-beforerecovery.tar.gz.split* > wiredTiger-beforerecovery.tar.gz

Generated at Thu Feb 08 03:47:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.