[SERVER-17152] WiredTiger file corrupted during power cycle test Created: 02/Feb/15  Updated: 20/Sep/17  Resolved: 23/Apr/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.0.0-rc7, 3.0.0-rc8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jonathan Abrahams Assignee: Bruce Lucas (Inactive)
Resolution: Done Votes: 0
Labels: 28qa, FT, cap-ss
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File CAP-1863-novalidate.js     File cap-1863.js     Text File mongod-wiredTiger.log     File mongod.log     File mongod.log-2015-02-02.tar.gz     File syslog-feb2.gz     File wt-corrupted1.tar.gz     File wt-failure-3.tar.gz    
Issue Links:
Related
related to SERVER-17818 Abrupt termination due to service shu... Closed
related to SERVER-17086 Can't start MongoDB w/ WiredTiger Eng... Closed
related to SERVER-17204 Crash before completion of first chec... Closed
related to SERVER-17451 WiredTiger unable to start if crash l... Closed
related to SERVER-17613 Unable to start mongod after unclean ... Closed
is related to SERVER-17587 Node crash scenario results in uncrec... Closed
is related to SERVER-17086 Can't start MongoDB w/ WiredTiger Eng... Closed
is related to SERVER-17210 Crash can leave data inaccessible due... Closed
is related to SERVER-17571 WiredTiger unable to start if crash l... Closed
Tested
Operating System: ALL
Participants:
Case:

 Description   

The power cycle testing performs random (unclean) crashes of the server. After the server reboots, mongod is run with repair.

In this particular instance, the repair failed due to file:WiredTiger.wt, connection: read checksum error:

 mongod --storageEngine wiredTiger --repair
2015-02-02T14:58:43.936-0500 I CONTROL  [initandlisten] MongoDB starting : pid=1958 port=27017 dbpath=/data/db 64-bit host=WTCrash
2015-02-02T14:58:43.936-0500 I CONTROL  [initandlisten]
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten]
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten]
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] db version v3.0.0-rc7
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] git version: e4c60053b2967e16f765fa25d16aa6d629faa196
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] build info: Linux build14.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] allocator: tcmalloc
2015-02-02T14:58:43.937-0500 I CONTROL  [initandlisten] options: { repair: true, storage: { engine: "wiredTiger" } }
2015-02-02T14:58:43.957-0500 W -        [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2015-02-02T14:58:43.957-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-02-02T14:58:43.957-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-02-02T14:58:43.961-0500 E STORAGE  [initandlisten] WiredTiger (0) [1422907123:961875][1958:0x7f55c79aebc0], file:WiredTiger.wt, connection: read checksum error [4096B @ 28672, 3705540348 != 3545740451]
2015-02-02T14:58:43.962-0500 E STORAGE  [initandlisten] WiredTiger (0) [1422907123:962064][1958:0x7f55c79aebc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2015-02-02T14:58:43.962-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1422907123:962166][1958:0x7f55c79aebc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-02-02T14:58:43.962-0500 I -        [initandlisten] Fatal Assertion 28558
2015-02-02T14:58:43.975-0500 I CONTROL  [initandlisten]
 0xf3a9d9 0xee4b21 0xec8f41 0xd50756 0x1363350 0x1363615 0x1363a81 0x12befee 0x12bf488 0x12bc833 0x12c01c6 0x12d6849 0x12fc40b 0x13629c8 0x1330a3b 0x12f9baa 0xd52234 0xd4fd78 0xa6543d 0x7e0237 0x7e5459 0x7f55c65b8ec5 0x7ddf29
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3A9D9"},{"b":"400000","o":"AE4B21"},{"b":"400000","o":"AC8F41"},{"b":"400000","o":"950756"},{"b":"400000","o":"F63350"},{"b":"400000","o":"F63615"},{"b":"400000","o":"F63A81"},{"b":"400000","o":"EBEFEE"},{"b":"400000","o":"EBF488"},{"b":"400000","o":"EBC833"},{"b":"400000","o":"EC01C6"},{"b":"400000","o":"ED6849"},{"b":"400000","o":"EFC40B"},{"b":"400000","o":"F629C8"},{"b":"400000","o":"F30A3B"},{"b":"400000","o":"EF9BAA"},{"b":"400000","o":"952234"},{"b":"400000","o":"94FD78"},{"b":"400000","o":"66543D"},{"b":"400000","o":"3E0237"},{"b":"400000","o":"3E5459"},{"b":"7F55C6597000","o":"21EC5"},{"b":"400000","o":"3DDF29"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc7", "gitVersion" : "e4c60053b2967e16f765fa25d16aa6d629faa196", "uname" : { "sysname" : "Linux", "release" : "3.13.0-32-generic", "version" : "#57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFCAAFE000", "elfType" : 3 }, { "b" : "7F55C7589000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F55C7381000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F55C717D000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F55C6E79000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F55C6B73000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F55C695D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F55C6597000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F55C77A7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3a9d9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee4b21]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xec8f41]
 mongod(+0x950756) [0xd50756]
 mongod(+0xF63350) [0x1363350]
 mongod(__wt_err+0x95) [0x1363615]
 mongod(__wt_panic+0x21) [0x1363a81]
 mongod(__wt_block_extlist_read+0x6E) [0x12befee]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12bf488]
 mongod(__wt_block_checkpoint_load+0x193) [0x12bc833]
 mongod(+0xEC01C6) [0x12c01c6]
 mongod(__wt_btree_open+0xB69) [0x12d6849]
 mongod(__wt_conn_btree_get+0x19B) [0x12fc40b]
 mongod(__wt_session_get_btree+0x2D8) [0x13629c8]
 mongod(__wt_metadata_open+0x2B) [0x1330a3b]
 mongod(wiredtiger_open+0xECA) [0x12f9baa]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x304) [0xd52234]
 mongod(+0x94FD78) [0xd4fd78]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa6543d]
 mongod(_ZN5mongo13initAndListenEi+0x6F7) [0x7e0237]
 mongod(main+0x139) [0x7e5459]
 libc.so.6(__libc_start_main+0xF5) [0x7f55c65b8ec5]
 mongod(+0x3DDF29) [0x7ddf29]
-----  END BACKTRACE  -----
2015-02-02T14:58:43.976-0500 I -        [initandlisten]
 
***aborting after fassert() failure



 Comments   
Comment by Jonathan Abrahams [ 23/Apr/15 ]

Since this spawned all the linked tickets, there's no further work to be done here.

Comment by Jonathan Abrahams [ 16/Mar/15 ]

Note - remove issue only observed in 3.0.1-rc0 with Ubuntu 14 VM hosted in KVM/QEMU. Not observed in CentOS 6 VM hosted in VMware.

Comment by Jonathan Abrahams [ 13/Mar/15 ]

Power cycle client with novalidate (includes remove)

Comment by Jonathan Abrahams [ 13/Mar/15 ]

Logfile of mongod which has the remove error

Comment by Daniel Pasette (Inactive) [ 02/Feb/15 ]

We need exact reproduction steps.
How much data is in this instance? If not large, can you make the entire contents of the dbpath available?

Generated at Thu Feb 08 03:43:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.