[SERVER-37082] Cannot start mongod or --repair (caused by unclean shutdown) Created: 12/Sep/18  Updated: 12/Sep/18  Resolved: 12/Sep/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chloe Chen Assignee: Nick Brewer
Resolution: Done Votes: 0
Labels: envm, rpo, trct, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.wt    
Operating System: Linux
Participants:

 Description   

Hello!

We have a production server that has crashed unexpectedly and we can't get it to start because of a error on WiredTiger.wt

Running ./mongod --dbpath /data/db

Get me:

2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] MongoDB starting : pid=10897 port=27017 dbpath=/slowfs/vginfra3/chloec/usage55/db 64-bit host=vgzeburt55
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] db version v3.6.2
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] git version: 489d177dbd0f0420a8ca04d39fd78d0a2c539420
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] allocator: tcmalloc
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] modules: none
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] build environment:
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] distarch: x86_64
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] target_arch: x86_64
2018-09-11T16:51:00.892-0700 I CONTROL [initandlisten] options: { storage:

{ dbPath: "/slowfs/vginfra3/chloec/usage55/db" }

}
2018-09-11T16:51:00.912-0700 W - [initandlisten] Detected unclean shutdown - /slowfs/vginfra3/chloec/usage55/db/mongod.lock is not empty.
2018-09-11T16:51:00.917-0700 I - [initandlisten] Detected data files in /slowfs/vginfra3/chloec/usage55/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-09-11T16:51:00.919-0700 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2018-09-11T16:51:00.922-0700 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=31494M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-09-11T16:51:00.944-0700 E STORAGE [initandlisten] WiredTiger error (0) [1536709860:944467][10897:0x7ffff7f789c0], file:WiredTiger.wt, connection: WiredTiger.turtle: encountered an illegal file format or internal value: (__wt_turtle_read, 291)
2018-09-11T16:51:00.944-0700 E STORAGE [initandlisten] WiredTiger error (-31804) [1536709860:944502][10897:0x7ffff7f789c0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-09-11T16:51:00.944-0700 F - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2018-09-11T16:51:00.944-0700 F - [initandlisten]

***aborting after fassert() failure

2018-09-11T16:51:00.961-0700 F - [initandlisten] Got signal: 6 (Aborted).

0x555557702681 0x555557701899 0x555557701d7d 0x7ffff70905e0 0x7ffff6cf31f7 0x7ffff6cf48e8 0x555555ec97bf 0x555555f91c5e 0x555555e67917 0x555555e67b33 0x555555e67e5c 0x555555fbd5d9 0x555555fbbc59 0x555555fa189a 0x555555fee4ab 0x555555fee9cd 0x555555feec8c 0x5555560572f2 0x555555fe5238 0x555555fbb32e 0x555555fbb40b 0x555555fa0604 0x555555f75fb9 0x555555f5a5b4 0x55555612aee7 0x555555e636c7 0x555555f393bc 0x555555ecb489 0x7ffff6cdfc05 0x555555f28d11
----- BEGIN BACKTRACE -----

{"backtrace":[\{"b":"555555554000","o":"21AE681","s":"_ZN5mongo15printStackTraceERSo"}

,{"b":"555555554000","o":"21AD899"},{"b":"555555554000","o":"21ADD7D"},{"b":"7FFFF7081000","o":"F5E0"},{"b":"7FFFF6CBE000","o":"351F7","s":"gsignal"},{"b":"7FFFF6CBE000","o":"368E8","s":"abort"},{"b":"555555554000","o":"9757BF","s":"ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"555555554000","o":"A3DC5E"},{"b":"555555554000","o":"913917","s":"wt_eventv"},{"b":"555555554000","o":"913B33","s":"wt_err"},{"b":"555555554000","o":"913E5C","s":"wt_panic"},{"b":"555555554000","o":"A695D9","s":"wt_turtle_read"},{"b":"555555554000","o":"A67C59","s":"wt_metadata_search"},{"b":"555555554000","o":"A4D89A","s":"wt_conn_dhandle_open"},{"b":"555555554000","o":"A9A4AB","s":"wt_session_get_dhandle"},{"b":"555555554000","o":"A9A9CD","s":"wt_session_get_dhandle"},{"b":"555555554000","o":"A9AC8C","s":"wt_session_get_btree_ckpt"},{"b":"555555554000","o":"B032F2","s":"wt_curfile_open"},{"b":"555555554000","o":"A91238"},{"b":"555555554000","o":"A6732E","s":"wt_metadata_cursor_open"},{"b":"555555554000","o":"A6740B","s":"wt_metadata_cursor"},{"b":"555555554000","o":"A4C604","s":"wiredtiger_open"},{"b":"555555554000","o":"A21FB9","s":"_ZN5mongo18WiredTigerKVEngineC1ERKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"555555554000","o":"A065B4"},{"b":"555555554000","o":"BD6EE7","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"555555554000","o":"90F6C7"},{"b":"555555554000","o":"9E53BC","s":"_ZN5mongo11mongoDbMainEiPPcS1"},{"b":"555555554000","o":"977489","s":"main"},{"b":"7FFFF6CBE000","o":"21C05","s":"__libc_start_main"},{"b":"555555554000","o":"9D4D11"}],"processInfo":{ "mongodbVersion" : "3.6.2", "gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420", "compiledModules" : [], "uname" :

{ "sysname" : "Linux", "release" : "3.10.0-693.11.6.el7.x86_64", "version" : "#1 SMP Thu Jan 4 01:06:37 UTC 2018", "machine" : "x86_64" }

, "somap" : [ { "b" : "555555554000", "elfType" : 3, "buildId" : "454A81637B4013AE082538A57B34F8A42B39277A" }, { "b" : "7FFFF88FA000", "elfType" : 3, "buildId" : "33DEC63F3B0D3EE9ABDAC478FF3E7F1F43FAF9DE" }, { "b" : "7FFFF7BC1000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FF4E72F4E574E143330FB3C66DB51613B0EC65EA" }, { "b" : "7FFFF79B9000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "6D322588B36D2617C03C0F3B93677E62FCFFDA81" }, { "b" : "7FFFF77B5000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "1E42EBFB272D37B726F457D6FE3C33D2B094BB69" }, { "b" : "7FFFF74B3000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "808BD35686C193F218A5AAAC6194C49214CFF379" }, { "b" : "7FFFF729D000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7FFFF7081000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "A48D21B2578A8381FBD8857802EAA660504248DC" }, { "b" : "7FFFF6CBE000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "95FF02A4BEBABC573C7827A66D447F7BABDDAA44" }, { "b" : "7FFFF7DDB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "22FA66DA7D14C88BF36C69454A357E5F1DEFAE4E" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x555557702681]
mongod(+0x21AD899) [0x555557701899]
mongod(+0x21ADD7D) [0x555557701d7d]
libpthread.so.0(+0xF5E0) [0x7ffff70905e0]
libc.so.6(gsignal+0x37) [0x7ffff6cf31f7]
libc.so.6(abort+0x148) [0x7ffff6cf48e8]
mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x555555ec97bf]
mongod(+0xA3DC5E) [0x555555f91c5e]
mongod(__wt_eventv+0x3D7) [0x555555e67917]
mongod(__wt_err+0x9D) [0x555555e67b33]
mongod(__wt_panic+0x33) [0x555555e67e5c]
mongod(__wt_turtle_read+0x269) [0x555555fbd5d9]
mongod(__wt_metadata_search+0xA9) [0x555555fbbc59]
mongod(__wt_conn_dhandle_open+0x8A) [0x555555fa189a]
mongod(__wt_session_get_dhandle+0xFB) [0x555555fee4ab]
mongod(__wt_session_get_dhandle+0x61D) [0x555555fee9cd]
mongod(__wt_session_get_btree_ckpt+0x14C) [0x555555feec8c]
mongod(__wt_curfile_open+0x52) [0x5555560572f2]
mongod(+0xA91238) [0x555555fe5238]
mongod(__wt_metadata_cursor_open+0x6E) [0x555555fbb32e]
mongod(__wt_metadata_cursor+0x4B) [0x555555fbb40b]
mongod(wiredtiger_open+0x19D4) [0x555555fa0604]
mongod(ZN5mongo18WiredTigerKVEngineC1ERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0x889) [0x555555f75fb9]
mongod(+0xA065B4) [0x555555f5a5b4]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x637) [0x55555612aee7]
mongod(+0x90F6C7) [0x555555e636c7]
mongod(ZN5mongo11mongoDbMainEiPPcS1+0x86C) [0x555555f393bc]
mongod(main+0x9) [0x555555ecb489]
libc.so.6(__libc_start_main+0xF5) [0x7ffff6cdfc05]
mongod(+0x9D4D11) [0x555555f28d11]
----- END BACKTRACE -----
Abort

 

Giving --repair doesn't seem to make much a difference.

I've seen several other cases in Jira, but seems like each case required a manual investigation.

Thanks!
Chloe



 Comments   
Comment by Nick Brewer [ 12/Sep/18 ]

chloec Glad to hear you were able to get it working from a backup. Some suggestions to keep in mind for the future:

-Nick

Comment by Chloe Chen [ 12/Sep/18 ]

It's a VM. Gladly we just found we have data backup before. But still thanks for the help. I will keep you suggestions in mind. 

 

Thanks,

Chloe

Comment by Nick Brewer [ 12/Sep/18 ]

chloec A blank .turtle file indicates corruption - as this file contain metadata that is used to interpret other WiredTiger files, we will not be able to perform a repair if the file is blank. In this case, your best option is to utilize any available backups you have.

Based on this line, I assume you're using slowfs:

{ dbPath: "/slowfs/vginfra3/chloec/usage55/db" }

I'm not familiar with how MongoDB performs with slowfs, however it's worth noting that MongoDB requires fync on directories. With that in mind, you may want to ensure you're using the appropriate FsyncStrategy option for slowfs, as outlined on its GitHub.

For tracking purposes, could you clarify whether this machine is a VM, native, container, etc?

Thanks,
-Nick

Comment by Chloe Chen [ 12/Sep/18 ]

My WiredTiger.turtle files is blank...

I am using mongodb-linux-x86_64-3.6.2 on CentOS7.3

 

Comment by Nick Brewer [ 12/Sep/18 ]

chloec To perform a repair attempt, we'd need both the WiredTiger.wt and WiredTiger.turtle files. Additionally, we'd need to confirm:

  • The operating system and version
  • The platform (virtual machine, container, native hardware, etc)

Thanks,
-Nick

Generated at Thu Feb 08 04:44:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.