[SERVER-16172] mongod --repair terminates before repair is attempted under WiredTiger Created: 15/Nov/14  Updated: 06/Apr/16  Resolved: 15/Jan/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc0, 2.8.0-rc4
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: WTplaybook, wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File db.tgz    
Issue Links:
Duplicate
duplicates SERVER-16173 mongod --repair not working under Wir... Closed
is duplicated by SERVER-16596 Repairing a WiredTiger database with ... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

During intializion under WiredTiger some data is pre-loaded in setGlobalStorgeEngine(), before repair is attempted. If any of the preloaded data is damaged, mongod terminates with an fassert:

 mongod(_ZN5mongo13fassertFailedEi+0x19E) [0x1063183ae]
 mongod(_ZN5mongo12_GLOBAL__N_116mdb_handle_errorEP18__wt_event_handlerP12__wt_sessioniPKc+0xDE) [0x106199cee]
 mongod(__wt_eventv+0x4A3) [0x1067a6763]
 mongod(__wt_err+0x99) [0x1067a68b9]
 mongod(__wt_illegal_value+0x63) [0x1067a6ed3]
 mongod(__wt_bm_preload+0xA1) [0x106726b41]
 mongod(__wt_btree_open+0x12AF) [0x106734bff]
 mongod(__wt_conn_btree_get+0x4DF) [0x10676102f]
 mongod(__wt_session_get_btree+0x26D) [0x1067a5d1d]
 mongod(__wt_session_get_btree_ckpt+0xD1) [0x1067a5a61]
 mongod(__wt_curfile_open+0x108) [0x10676b6b8]
 mongod(__wt_open_cursor+0x110) [0x1067a25d0]
 mongod(__wt_curtable_open+0xF7) [0x10677b277]
 mongod(__wt_open_cursor+0x252) [0x1067a2712]
 mongod(__session_open_cursor+0x1A5) [0x1067a32e5]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEy+0x13E) [0x1061a47fe]
 mongod(_ZN5mongo16WiredTigerCursor5_initERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEyPNS_22WiredTigerRecoveryUnitE+0x5F) [0x1061a37af]
 mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_7DiskLocERKNS_20CollectionScanParams9DirectionEb+0x5A) [0x1061a173a]
 mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_7DiskLocERKNS_20CollectionScanParams9DirectionE+0x3E) [0x10619fd8e]
 mongod(_ZN5mongo21WiredTigerRecordStoreC2EPNS_16OperationContextERKNS_10StringDataES5_bxxPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x51D) [0x10619d2fd]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextERKNS_10StringDataES5_RKNS_17CollectionOptionsE+0x134) [0x10619ab94]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEE+0x12A) [0x106124d3a]
 mongod(_ZN5mongo15KVStorageEngine10finishInitEv+0x5BB) [0x106127c4b]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x45) [0x105e7bb85]
 mongod(_ZN5mongoL14_initAndListenEi+0xC8C) [0x105bdd7ec]
 mongod(_ZN5mongo13initAndListenEi+0x13) [0x105bdc613]

The consequence is that repair won't even be attempted for some kinds of damage. Based on experiment this appears to affect the block manager pages, the root btree page, and at least some internal btree pages.



 Comments   
Comment by Githook User [ 15/Jan/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: Better repair for WT

  • Doesn't construct RecordStores or indexes before they've been repaired
  • No longer need to skip checking index versions.
  • Updates numRecords and dataSize after the repair.

Related issues:
SERVER-16817 Skip checking index versions in WT during --repair
SERVER-16172 --repair fails before repairing collections in WT

A call to flushAllFiles is commented out due to SERVER-16869. Resolving it
should uncomment that line.
Branch: master
https://github.com/mongodb/mongo/commit/48deaff5bf31d21807c5dd7e7f5d313c7c96e8dc

Comment by Bruce Lucas (Inactive) [ 12/Jan/15 ]

This is still occurring in rc4. Attached db has 4 KB at offset 0x001bf000, which is the root node, overwritten with 0s. Log shows that data is loaded from collection during initialization before repair is attempted.

2015-01-12T08:11:34.326-0500 I CONTROL  [initandlisten] MongoDB starting : pid=7742 port=27017 dbpath=/Users/bdlucas/db/db/r0 64-bit host=reboot.local
2015-01-12T08:11:34.326-0500 I CONTROL  [initandlisten] 
2015-01-12T08:11:34.326-0500 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
2015-01-12T08:11:34.327-0500 I CONTROL  [initandlisten] db version v2.8.0-rc4
2015-01-12T08:11:34.327-0500 I CONTROL  [initandlisten] git version: 3ad571742911f04b307f0071979425511c4f2570
2015-01-12T08:11:34.327-0500 I CONTROL  [initandlisten] build info: Darwin mci-osx108-7.build.10gen.cc 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
2015-01-12T08:11:34.335-0500 I CONTROL  [initandlisten] allocator: system
2015-01-12T08:11:34.335-0500 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: "/Users/bdlucas/db/db/r0", engine: "wiredtiger" } }
2015-01-12T08:11:34.335-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,extensions=[local=(entry=index_collator_extension)],statistics=(fast),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-01-12T08:11:35.498-0500 I STORAGE  [initandlisten] Repairing size cache
2015-01-12T08:11:35.507-0500 I STORAGE  [initandlisten] WiredTiger progress session.verify 2
2015-01-12T08:11:35.507-0500 I STORAGE  [initandlisten] Verify succeeded on uri table:sizeStorer. Not salvaging.
2015-01-12T08:11:35.508-0500 I STORAGE  [initandlisten] Repairing catalog metadata
2015-01-12T08:11:35.508-0500 I STORAGE  [initandlisten] WiredTiger progress session.verify 2
2015-01-12T08:11:35.508-0500 I STORAGE  [initandlisten] Verify succeeded on uri table:_mdb_catalog. Not salvaging.
2015-01-12T08:11:35.515-0500 E STORAGE  [initandlisten] WiredTiger (0) [1421068295:515662][7742:0x7fff7db4c310], file:collection-2-2881489413117401815.wt, session.open_cursor: read checksum error [4096B @ 1830912, 1345086692 != 0]
2015-01-12T08:11:35.515-0500 E STORAGE  [initandlisten] WiredTiger (0) [1421068295:515720][7742:0x7fff7db4c310], file:collection-2-2881489413117401815.wt, session.open_cursor: collection-2-2881489413117401815.wt: encountered an illegal file format or internal value
2015-01-12T08:11:35.515-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1421068295:515759][7742:0x7fff7db4c310], file:collection-2-2881489413117401815.wt, session.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-01-12T08:11:35.553-0500 I -        [initandlisten] Fatal Assertion 28558
2015-01-12T08:11:35.561-0500 I CONTROL  [initandlisten] 
 0x10e782479 0x10e736380 0x10e724ad6 0x10e5ad165 0x10ebb60b3 0x10ebb6209 0x10ebb6820 0x10eb316f0 0x10eb3f9a9 0x10eb3ef03 0x10eb3e4b4 0x10eb5e628 0x10ebb549e 0x10ebb51d1 0x10eb69b6e 0x10ebb19f0 0x10eb79ba7 0x10ebb1b32 0x10ebb27b8 0x10e5b9a2e 0x10e5b8baf 0x10e5b6b7c 0x10e5b4f5c 0x10e5b2353 0x10e5ae7c4 0x10e52ff9a 0x10e5333ae 0x10e5ac053 0x10e2d03c7 0x10e040645 0x10e03f463 0x10e0442bc 0x10e03f444 0x6
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"10E03E000","o":"744479"},{"b":"10E03E000","o":"6F8380"},{"b":"10E03E000","o":"6E6AD6"},{"b":"10E03E000","o":"56F165"},{"b":"10E03E000","o":"B780B3"},{"b":"10E03E000","o":"B78209"},{"b":"10E03E000","o":"B78820"},{"b":"10E03E000","o":"AF36F0"},{"b":"10E03E000","o":"B019A9"},{"b":"10E03E000","o":"B00F03"},{"b":"10E03E000","o":"B004B4"},{"b":"10E03E000","o":"B20628"},{"b":"10E03E000","o":"B7749E"},{"b":"10E03E000","o":"B771D1"},{"b":"10E03E000","o":"B2BB6E"},{"b":"10E03E000","o":"B739F0"},{"b":"10E03E000","o":"B3BBA7"},{"b":"10E03E000","o":"B73B32"},{"b":"10E03E000","o":"B747B8"},{"b":"10E03E000","o":"57BA2E"},{"b":"10E03E000","o":"57ABAF"},{"b":"10E03E000","o":"578B7C"},{"b":"10E03E000","o":"576F5C"},{"b":"10E03E000","o":"574353"},{"b":"10E03E000","o":"5707C4"},{"b":"10E03E000","o":"4F1F9A"},{"b":"10E03E000","o":"4F53AE"},{"b":"10E03E000","o":"56E053"},{"b":"10E03E000","o":"2923C7"},{"b":"10E03E000","o":"2645"},{"b":"10E03E000","o":"1463"},{"b":"10E03E000","o":"62BC"},{"b":"10E03E000","o":"1444"},{"b":"0","o":"6"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4", "gitVersion" : "3ad571742911f04b307f0071979425511c4f2570", "uname" : { "sysname" : "Darwin", "release" : "13.4.0", "version" : "Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64", "machine" : "x86_64" }, "somap" : [ { "path" : "/Users/bdlucas/mongodb/mongodb-osx-x86_64-2.8.0-rc4/bin/mongod", "machType" : 2, "b" : "10E03E000", "buildId" : "DD7253EC39F535728BD65A336DDDA5E4" }, { "path" : "/usr/lib/libSystem.B.dylib", "machType" : 6, "b" : "7FFF936A6000", "buildId" : "70B235FCBCED367BBA6C67C299BAE7D9" }, { "path" : "/usr/lib/libc++.1.dylib", "machType" : 6, "b" : "7FFF93592000", "buildId" : "4F68DFC5207739A8A449CAC5FDEE7BDE" }, { "path" : "/usr/lib/system/libcache.dylib", "machType" : 6, "b" : "7FFF8E787000", "buildId" : "BDC1E65B72A13DA3A57CB23159CAAD0B" }, { "path" : "/usr/lib/system/libcommonCrypto.dylib", "machType" : 6, "b" : "7FFF8D2D7000", "buildId" : "8C4F0CA0389C3EDCB155E62DD2187E1D" }, { "path" : "/usr/lib/system/libcompiler_rt.dylib", "machType" : 6, "b" : "7FFF8FBF4000", "buildId" : "4CD916B21B17362AB403EF24A1DAC141" }, { "path" : "/usr/lib/system/libcopyfile.dylib", "machType" : 6, "b" : "7FFF8FA50000", "buildId" : "CF29DFF605893590834C82E2316612E8" }, { "path" : "/usr/lib/system/libcorecrypto.dylib", "machType" : 6, "b" : "7FFF985A6000", "buildId" : "F3973C2814B63006BB2B00DD7F09ABC7" }, { "path" : "/usr/lib/system/libdispatch.dylib", "machType" : 6, "b" : "7FFF8FBFF000", "buildId" : "C4E4A18D3C3B3C9C8709A4270D998DE7" }, { "path" : "/usr/lib/system/libdyld.dylib", "machType" : 6, "b" : "7FFF920B4000", "buildId" : "41077DD7F9093B8A863E72AE304EDE13" }, { "path" : "/usr/lib/system/libkeymgr.dylib", "machType" : 6, "b" : "7FFF94F87000", "buildId" : "3AA8D85DCF003BD3A5A0E28E1A32A6D8" }, { "path" : "/usr/lib/system/liblaunch.dylib", "machType" : 6, "b" : "7FFF97ADC000", "buildId" : "A40A0C7B321639B48AE0B5D3BAF1DA8A" }, { "path" : "/usr/lib/system/libmacho.dylib", "machType" : 6, "b" : "7FFF93343000", "buildId" : "1D2910DFC0363A82A3FD44FF73B5FF9B" }, { "path" : "/usr/lib/system/libquarantine.dylib", "machType" : 6, "b" : "7FFF979A9000", "buildId" : "7A1A2BCBC03D3A25BFA43E569B2D2C38" }, { "path" : "/usr/lib/system/libremovefile.dylib", "machType" : 6, "b" : "7FFF9812E000", "buildId" : "3543F917928E3DB2A2F47AB73B4970EF" }, { "path" : "/usr/lib/system/libsystem_asl.dylib", "machType" : 6, "b" : "7FFF98234000", "buildId" : "655FB34352CF3E2FB14DBEBF5AAEF94D" }, { "path" : "/usr/lib/system/libsystem_blocks.dylib", "machType" : 6, "b" : "7FFF8D3A9000", "buildId" : "FB856CD12AEA39078E9B1E54B6827F82" }, { "path" : "/usr/lib/system/libsystem_c.dylib", "machType" : 6, "b" : "7FFF8E69E000", "buildId" : "6FD3A4004BB23B95B90CBE6E9D0D78FA" }, { "path" : "/usr/lib/system/libsystem_configuration.dylib", "machType" : 6, "b" : "7FFF930BB000", "buildId" : "4998CB6A9D54390A9F575D1AC53C135C" }, { "path" : "/usr/lib/system/libsystem_dnssd.dylib", "machType" : 6, "b" : "7FFF935E5000", "buildId" : "3F8C6A0730463E88858FD9CEFC43A405" }, { "path" : "/usr/lib/system/libsystem_info.dylib", "machType" : 6, "b" : "7FFF8E618000", "buildId" : "7D41A156D2853849A2C3C04ADE797D98" }, { "path" : "/usr/lib/system/libsystem_kernel.dylib", "machType" : 6, "b" : "7FFF9AD86000", "buildId" : "9EDE872E2A9E3A788E1DAB790794A098" }, { "path" : "/usr/lib/system/libsystem_m.dylib", "machType" : 6, "b" : "7FFF90C33000", "buildId" : "B7F0E2E4277733FCA787D6430B630D54" }, { "path" : "/usr/lib/system/libsystem_malloc.dylib", "machType" : 6, "b" : "7FFF93AE4000", "buildId" : "A695B4E438E9332EA77229D31E3F1385" }, { "path" : "/usr/lib/system/libsystem_network.dylib", "machType" : 6, "b" : "7FFF91C69000", "buildId" : "8B1E1F1DA5CC3BAE8B1EABC84337A364" }, { "path" : "/usr/lib/system/libsystem_notify.dylib", "machType" : 6, "b" : "7FFF9951C000", "buildId" : "9B34B4FEF5AD3F09A5F046AFF3571323" }, { "path" : "/usr/lib/system/libsystem_platform.dylib", "machType" : 6, "b" : "7FFF90CFB000", "buildId" : "3C3D3DA832B9324398ECD89B9A1670B3" }, { "path" : "/usr/lib/system/libsystem_pthread.dylib", "machType" : 6, "b" : "7FFF912DA000", "buildId" : "AB498556B555310E9041F67EC9E00E2C" }, { "path" : "/usr/lib/system/libsystem_sandbox.dylib", "machType" : 6, "b" : "7FFF94842000", "buildId" : "0D0B13EA6B7A3AC8BE60B548543BEB77" }, { "path" : "/usr/lib/system/libsystem_stats.dylib", "machType" : 6, "b" : "7FFF91C91000", "buildId" : "C588E082D94B35109F9A7AD83B3402DE" }, { "path" : "/usr/lib/system/libunc.dylib", "machType" : 6, "b" : "7FFF9812C000", "buildId" : "62682455186236FE8A047A6B91256438" }, { "path" : "/usr/lib/system/libunwind.dylib", "machType" : 6, "b" : "7FFF91CF3000", "buildId" : "78DCC3582FC1302EB3950155B47CB547" }, { "path" : "/usr/lib/system/libxpc.dylib", "machType" : 6, "b" : "7FFF8EDF5000", "buildId" : "AB40CD57F4543FD4B41563B3C0D5C624" }, { "path" : "/usr/lib/libobjc.A.dylib", "machType" : 6, "b" : "7FFF94AC1000", "buildId" : "AD7FD984271E30F4A3616B20319EC73B" }, { "path" : "/usr/lib/libauto.dylib", "machType" : 6, "b" : "7FFF92F7D000", "buildId" : "F45C36E8B6063886B5B1B6745E757CA8" }, { "path" : "/usr/lib/libc++abi.dylib", "machType" : 6, "b" : "7FFF9AC25000", "buildId" : "21A807D367323455B77F743E9F916DF0" }, { "path" : "/usr/lib/libDiagnosticMessagesClient.dylib", "machType" : 6, "b" : "7FFF8EDF3000", "buildId" : "4CDB0F7BC0AF3424BC39495696F0DB1E" } ] }}
 mongod(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x39) [0x10e782479]
 mongod(_ZN5mongo10logContextEPKc+0x100) [0x10e736380]
 mongod(_ZN5mongo13fassertFailedEi+0xD6) [0x10e724ad6]
 mongod(_ZN5mongo12_GLOBAL__N_116mdb_handle_errorEP18__wt_event_handlerP12__wt_sessioniPKc+0xD5) [0x10e5ad165]
 mongod(__wt_eventv+0x4A3) [0x10ebb60b3]
 mongod(__wt_err+0x99) [0x10ebb6209]
 mongod(__wt_illegal_value+0x60) [0x10ebb6820]
 mongod(__wt_bm_read+0x70) [0x10eb316f0]
 mongod(__wt_bt_read+0x89) [0x10eb3f9a9]
 mongod(__wt_btree_tree_open+0x43) [0x10eb3ef03]
 mongod(__wt_btree_open+0xE44) [0x10eb3e4b4]
 mongod(__wt_conn_btree_get+0x268) [0x10eb5e628]
 mongod(__wt_session_get_btree+0x27E) [0x10ebb549e]
 mongod(__wt_session_get_btree_ckpt+0xD1) [0x10ebb51d1]
 mongod(__wt_curfile_open+0x1EE) [0x10eb69b6e]
 mongod(__wt_open_cursor+0x110) [0x10ebb19f0]
 mongod(__wt_curtable_open+0xF7) [0x10eb79ba7]
 mongod(__wt_open_cursor+0x252) [0x10ebb1b32]
 mongod(__session_open_cursor+0x1A8) [0x10ebb27b8]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEy+0x13E) [0x10e5b9a2e]
 mongod(_ZN5mongo16WiredTigerCursor5_initERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEyPNS_22WiredTigerRecoveryUnitE+0x5F) [0x10e5b8baf]
 mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionEb+0x5C) [0x10e5b6b7c]
 mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionE+0x6C) [0x10e5b4f5c]
 mongod(_ZN5mongo21WiredTigerRecordStoreC2EPNS_16OperationContextERKNS_10StringDataES5_bxxPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x3C3) [0x10e5b2353]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextERKNS_10StringDataES5_RKNS_17CollectionOptionsE+0x134) [0x10e5ae7c4]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEE+0x12A) [0x10e52ff9a]
 mongod(_ZN5mongo15KVStorageEngineC2EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x98E) [0x10e5333ae]
 mongod(_ZNK5mongo12_GLOBAL__N_117WiredTigerFactory6createERKNS_19StorageGlobalParamsE+0xC3) [0x10e5ac053]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x77) [0x10e2d03c7]
 mongod(_ZN5mongoL14_initAndListenEi+0xC95) [0x10e040645]
 mongod(_ZN5mongo13initAndListenEi+0x13) [0x10e03f463]
 mongod(main+0x3BC) [0x10e0442bc]
 mongod(start+0x34) [0x10e03f444]
 ??? [0x6]
-----  END BACKTRACE  -----

Comment by Mathias Stearn [ 08/Dec/14 ]

Handled by fix for SERVER-16173

Generated at Thu Feb 08 03:40:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.