[SERVER-39824] WiredTiger.wt: encountered an illegal file format or internal value Created: 25/Feb/19  Updated: 14/Mar/19  Resolved: 14/Mar/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.15
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Matthew Williams Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: WiredTiger.wt, corrupt, wt-repair-success
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

QEMU/KVM Virtual Machine
Ubuntu 16.04.5


Attachments: HTML File WiredTiger     File WiredTiger.turtle     File WiredTiger.turtle     File WiredTiger.wt     File WiredTiger.wt     File WiredTigerLAS.wt     File _mdb_catalog.wt     File sizeStorer.wt    
Operating System: ALL
Steps To Reproduce:

Run Mongodb on a VM

Restart VM incorrectly to cause disk corruption

Resolve Disk corruption

Attempt to restart mongodb

Participants:

 Description   

Looks like this error: https://jira.mongodb.org/browse/SERVER-26103

I am running a JuJu Controller that had 2 other host in a replication state. However, during a mishap the other two got wiped and upon rebooting and attempting to restore the database I am running in now that shows the following errors:

 

Feb 25 18:34:27 hqosjuju systemd[1]: Started juju state database.
Feb 25 18:34:27 hqosjuju mongod[1410]: 2019-02-25T18:34:27.614+0000 W CONTROL [main] No SSL certificate validation can be performed since no CA file has been provided; please specify an sslCAFile parameter
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] MongoDB starting : pid=1410 port=37017 dbpath=/var/lib/juju/db 64-bit host=hqosjuju
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] db version v3.2.15
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] git version: e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] allocator: tcmalloc
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] modules: none
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] build environment:
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] distarch: x86_64
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] target_arch: x86_64
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] options: { net: { ipv6: true, port: 37017, ssl:

Unknown macro: { PEMKeyFile}

}, replication: { oplogSizeMB: 1024, replSet: "juju" }, security: { authorization: "enabled", keyFile: "/var/lib/juju/shared-secret" }, storage: { dbPath: "/var/lib/juju/db", engine: "wiredTiger", journal:

Unknown macro: { enabled}

, wiredTiger: { engineConfig: { cacheSizeGB:
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] WiredTiger (0) [1551119667:660526][1410:0x7f4db7054bc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 540672: block header checksum of 1881944125 doesn't match expected checksum of 28637535
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] WiredTiger (0) [1551119667:660613][1410:0x7f4db7054bc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] WiredTiger (-31804) [1551119667:660627][1410:0x7f4db7054bc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] Fatal Assertion 28558
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten]

***aborting after fassert() failure
Feb 25 18:34:27 hqosjuju mongod.37017[1410]: [initandlisten] Got signal: 6 (Aborted).

0x12a7701 0x12a6559 0x12a6e81 0x7f4db3ff5390 0x7f4db3c4f428 0x7f4db3c5102a 0x12209f2 0x1000efa 0x6f6234 0x6f6450 0x6f66a8 0x1356faf 0x13574fb 0x1353aed 0x13586c7 0x13721cb 0x13ab4d3 0x14357db 0x1435d1d 0x1435fdc 0x13b9bd1 0x14324f8 0x13f5c0e 0x13f5ceb 0x13a76e9 0xfe4fad 0xfdd474 0xeceb7e 0x73bd14 0x6f73a2 0x7f4db3c3a830 0x736e99
----- BEGIN BACKTRACE -----
{"backtrace":[

Unknown macro: {"b"}

,{"b":"400000","o":"EA6559"},{"b":"400000","o":"EA6E81"},{"b":"7F4DB3FE4000","o":"11390"},{"b":"7F4DB3C1A000","o":"35428","s":"gsignal"},{"b":"7F4DB3C1A000","o":"3702A","s":"abort"},{"b":"400000","o":"E209F2","s":"ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C00EFA"},{"b":"400000","o":"2F6234","s":"_wt_eventv"},{"b":"400000","o":"2F6450","s"
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x12a7701]
mongod(+0xEA6559) [0x12a6559]
mongod(+0xEA6E81) [0x12a6e81]
libpthread.so.0(+0x11390) [0x7f4db3ff5390]
libc.so.6(gsignal+0x38) [0x7f4db3c4f428]
libc.so.6(abort+0x16A) [0x7f4db3c5102a]
mongod(_ZN5mongo13fassertFailedEi+0xA2) [0x12209f2]
mongod(+0xC00EFA) [0x1000efa]
mongod(__wt_eventv+0x3D7) [0x6f6234]
mongod(__wt_err+0x9D) [0x6f6450]
mongod(__wt_panic+0x24) [0x6f66a8]
mongod(__wt_block_extlist_read+0x8F) [0x1356faf]
mongod(__wt_block_extlist_read_avail+0x2B) [0x13574fb]
mongod(__wt_block_checkpoint_load+0x26D) [0x1353aed]
mongod(+0xF586C7) [0x13586c7]
mongod(__wt_btree_open+0xB3B) [0x13721cb]
mongod(__wt_conn_btree_open+0x163) [0x13ab4d3]
mongod(__wt_session_get_btree+0xFB) [0x14357db]
mongod(__wt_session_get_btree+0x63D) [0x1435d1d]
mongod(__wt_session_get_btree_ckpt+0x14C) [0x1435fdc]
mongod(__wt_curfile_open+0x161) [0x13b9bd1]
mongod(+0x10324F8) [0x14324f8]
mongod(__wt_metadata_cursor_open+0x6E) [0x13f5c0e]
mongod(__wt_metadata_cursor+0x4B) [0x13f5ceb]
mongod(wiredtiger_open+0x1659) [0x13a76e9]
mongod(ZN5mongo18WiredTigerKVEngineC1ERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_S8_mbbb+0xA6D) [0xfe4fad]
mongod(+0xBDD474) [0xfdd474]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x3FE) [0xeceb7e]
mongod(+0x33BD14) [0x73bd14]
mongod(main+0x732) [0x6f73a2]
libc.so.6(__libc_start_main+0xF0) [0x7f4db3c3a830]
mongod(_start+0x29) [0x736e99]
----- END BACKTRACE -----
Feb 25 18:34:27 hqosjuju systemd[1]: juju-db.service: Main process exited, code=dumped, status=6/ABRT

It was running on a VM that was restarted incorrectly and caused some disk corruption, which has been since resolved. I currently have a backup of the corrupt VM, a Backup of the database which was ran before the VM was corrupted, and a semi-working restored database on a new VM.

I was able to get the database working on a new VM with the issue in juju (not mongo related) that the 509x cert is incorrect so I cannot use that server. I want to restore to the original VM (with the right certs and disk corruption issues resolved) but cannot start mongod without these issues above. It looks like the WiredTiger.wt file is corrupt. I have seen multiple forum posts and Jira issues where you guys have repaired the issue but provided no insight in to how. So...I am posting the files here.

If there is a way to restore a database without starting a database, I would love to see documentation on that, as thus far I can find none. I have both BSON files from a dump as well as a restore with all of the .wt files (a ton of them) I provided all of the WiredTiger files as most previous posts have requested.



 Comments   
Comment by Eric Sedor [ 14/Mar/19 ]

Glad to hear matthew.williams! To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.

Comment by Matthew Williams [ 13/Mar/19 ]

This resolved the issue and it has survived a number of tests on it. I have other issues now, but those are mostly related to juju and its database management and not directly related to mongo. Thank you all for your help.

Comment by Eric Sedor [ 06/Mar/19 ]

Thanks for keeping us updated; good to hear so far!

Comment by Matthew Williams [ 06/Mar/19 ]

This appears to have successfully repaired the live database, I am running it through some tests and attempting to remove replication before determining if this was 100% successful.

Comment by Matthew Williams [ 01/Mar/19 ]

This appears to have functioned on the test server. I am testing it on the live server now. 

Comment by Eric Sedor [ 27/Feb/19 ]

Sorry to hear, matthew.williams. It's indeed a good move working off of a copy.

Our next recommendation would be to attempt a similar mongod --repair but using mongod version 4.0.5+. Can you let us know if that helps?

Comment by Matthew Williams [ 27/Feb/19 ]

Ok, after a reboot and a few other attempts to repair/restore the database. I am down to this:

 

Feb 27 20:29:48 hqosjuju systemd[1]: Started juju state database.
– Subject: Unit juju-db.service has finished start-up
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit juju-db.service has finished starting up.

– The start-up result is done.
Feb 27 20:29:48 hqosjuju mongod[1527]: 2019-02-27T20:29:48.947+0000 W CONTROL [main] No SSL certificate validation can be performed since no CA file has been provided; please specify an sslCAFile parameter
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] MongoDB starting : pid=1527 port=37017 dbpath=/var/lib/juju/db 64-bit host=hqosjuju
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] db version v3.2.15
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] git version: e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] allocator: tcmalloc
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] modules: none
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] build environment:
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] distarch: x86_64
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] target_arch: x86_64
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] options: { net: { ipv6: true, port: 37017, ssl:

Unknown macro: { PEMKeyFile}

}, replication: { oplogSizeMB: 5
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] Detected unclean shutdown - /var/lib/juju/db/mongod.lock is not empty.
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] Recovering data from the last clean checkpoint.
Feb 27 20:29:48 hqosjuju mongod.37017[1527]: [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path
Feb 27 20:29:49 hqosjuju mongod.37017[1527]: [initandlisten] Fatal Assertion 34433
Feb 27 20:29:49 hqosjuju mongod.37017[1527]: [initandlisten]

***aborting after fassert() failure
Feb 27 20:29:49 hqosjuju systemd[1]: juju-db.service: Main process exited, code=exited, status=14/n/a
Feb 27 20:29:49 hqosjuju systemd[1]: juju-db.service: Unit entered failed state.
Feb 27 20:29:49 hqosjuju systemd[1]: juju-db.service: Failed with result 'exit-code'.
Feb 27 20:29:49 hqosjuju systemd[1]: juju-db.service: Service hold-off time over, scheduling restart.
Feb 27 20:29:49 hqosjuju systemd[1]: Stopped juju state database.
– Subject: Unit juju-db.service has finished shutting down
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit juju-db.service has finished shutting down.
Feb 27 20:29:49 hqosjuju systemd[1]: juju-db.service: Start request repeated too quickly.
Feb 27 20:29:49 hqosjuju systemd[1]: Failed to start juju state database.
– Subject: Unit juju-db.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit juju-db.service has failed.

I will await further isntruction/repair attempts.

Comment by Matthew Williams [ 27/Feb/19 ]

This has led to this error during the repair:

 

root@server:/home/server# /usr/lib/juju/mongo3.2/bin/mongod --repair --dbpath '/var/lib/juju/db'
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] MongoDB starting : pid=1611 port=27017 dbpath=/var/lib/juju/db 64-bit host=hqosjuju
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] db version v3.2.15
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] git version: e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] modules: none
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] build environment:
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] distarch: x86_64
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-02-27T20:25:02.925+0000 I CONTROL [initandlisten] options:

Unknown macro: { repair}

}
2019-02-27T20:25:02.945+0000 I - [initandlisten] Detected data files in /var/lib/juju/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-02-27T20:25:02.945+0000 W - [initandlisten] Detected unclean shutdown - /var/lib/juju/db/mongod.lock is not empty.
2019-02-27T20:25:02.945+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2019-02-27T20:25:02.945+0000 I STORAGE [initandlisten] Detected WT journal files. Running recovery from last checkpoint.
2019-02-27T20:25:02.945+0000 I STORAGE [initandlisten] journal to nojournal transition config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2019-02-27T20:25:03.018+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),,log=(enabled=false),
2019-02-27T20:25:03.058+0000 I STORAGE [initandlisten] Repairing size cache
2019-02-27T20:25:03.059+0000 I STORAGE [initandlisten] Verify succeeded on uri table:sizeStorer. Not salvaging.
2019-02-27T20:25:03.060+0000 I STORAGE [initandlisten] Repairing catalog metadata
2019-02-27T20:25:03.062+0000 I STORAGE [initandlisten] Verify succeeded on uri table:_mdb_catalog. Not salvaging.
2019-02-27T20:25:03.068+0000 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2019-02-27T20:25:03.068+0000 I CONTROL [initandlisten]
2019-02-27T20:25:03.068+0000 I STORAGE [initandlisten] repairDatabase admin
2019-02-27T20:25:03.069+0000 I STORAGE [initandlisten] Repairing collection admin.system.users
2019-02-27T20:25:03.069+0000 I STORAGE [initandlisten] Verify failed on uri table:collection-13-8833743074834640358. Running a salvage operation.
2019-02-27T20:25:03.069+0000 I - [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 190
2019-02-27T20:25:03.069+0000 I - [initandlisten]

***aborting after invariant() failure

2019-02-27T20:25:03.072+0000 F - [initandlisten] Got signal: 6 (Aborted).

0x12a7701 0x12a6559 0x12a6e81 0x7f02ae3fa390 0x7f02ae054428 0x7f02ae05602a 0x12208c4 0x8b6680 0x8bdb30 0x8c09b0 0xd1f0b1 0x739325 0x73c4c0 0x6f73a2 0x7f02ae03f830 0x736e99
----- BEGIN BACKTRACE -----
Unknown macro: {"backtrace"}
,{"b":"400000","o":"EA6559"},{"b":"400000","o":"EA6E81"},{"b":"7F02AE3E9000","o":"11390"},{"b":"7F02AE01F000","o":"35428","s":"gsignal"},{"b":"7F02AE01F000","o":"3702A","s":"abort"},

Unknown macro: {"b"}

,{"b":"400000","o":"4B6680","s":"_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE"},{"b":"400000","o":"4BDB30","s":"_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE"},{"b":"400000","o":"4C09B0","s":"_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb"},

Unknown macro: {"b"}

,{"b":"400000","o":"339325"},{"b":"400000","o":"33C4C0"},{"b":"400000","o":"2F73A2","s":"main"},

Unknown macro: {"b"}

,{"b":"400000","o":"336E99","s":"_start"}],"processInfo":

Unknown macro: { "mongodbVersion" }

, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "154504DC4E334AF6C6D92A625384E0136F3E5837" }, { "b" : "7FFF8065D000", "elfType" : 3, "buildId" : "EFD9CD6D855DEE3F29B3F5CA8D2C6C1C2B7D7CAB" }, { "b" : "7F02B0FD9000", "path" : "/usr/lib/libtcmalloc.so.4", "elfType" : 3, "buildId" : "C376C112685221C43033ED32DDD1E138B658483B" }, { "b" : "7F02B0DBF000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "8D9BD4CE26E45EF16075C67D5F5EEAFD8B562832" }, { "b" : "7F02B0BB7000", "path" : "/usr/lib/x86_64-linux-gnu/libsnappy.so.1", "elfType" : 3, "buildId" : "CE5C34DFF6A98121F82EED45478616455801DB1D" }, { "b" : "7F02B0939000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.58.0", "elfType" : 3, "buildId" : "9F70D8EB5739EEF251E6DE6DED289027DA61844E" }, { "b" : "7F02B0721000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.58.0", "elfType" : 3, "buildId" : "FC0239AC1E59EB10991A2423B48986ADB010F9D3" }, { "b" : "7F02B04FB000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0", "elfType" : 3, "buildId" : "93798D047035F12B0AA8DE4DDF45DA8636519986" }, { "b" : "7F02B02F7000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0", "elfType" : 3, "buildId" : "3EBF263E88DAE32EE640C9C599A9D7DD59F3DF28" }, { "b" : "7F02B00EF000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.58.0", "elfType" : 3, "buildId" : "34C98695297484FCDE0AD442BEB8412F1B787CE7" }, { "b" : "7F02AFDE7000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_regex.so.1.58.0", "elfType" : 3, "buildId" : "FF52EBC3EA55DD649B9A6D65D8AC7D73CF2B99D7" }, { "b" : "7F02AFBDE000", "path" : "/usr/lib/x86_64-linux-gnu/libpcrecpp.so.0", "elfType" : 3, "buildId" : "96888DAFDBE35DEDCAE309DB9C0943C2C2271EB1" }, { "b" : "7F02AF961000", "path" : "/usr/lib/x86_64-linux-gnu/libyaml-cpp.so.0.5", "elfType" : 3, "buildId" : "D163879ED83585A5B2B58D41CDCB78ABBBD351C5" }, { "b" : "7F02AF6F8000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "473092A9AF373FB0CAB555F9A003BC67F47756B6" }, { "b" : "7F02AF2B3000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "8942CA58A3B910E883CC31E04A23DBD09729B4B0" }, { "b" : "7F02AF0AB000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "69143E8B39040C964D3958490535322675F15DD3" }, { "b" : "7F02AEEA7000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "37BFC3D8F7E3B022DAC7943B1A5FACD40CEBF0AD" }, { "b" : "7F02AEB25000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "C5CA582A8E1EE50F59CAAAF84A1464C8CB3C2F3D" }, { "b" : "7F02AE81C000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "BAD67A84E56E73D031AE507261DA066B35949D34" }, { "b" : "7F02AE606000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F02AE3E9000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "B17C21299099640A6D863E423D99265824E7BB16" }, { "b" : "7F02AE01F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "1CA54A6E0D76188105B12E49FE6B8019BF08803A" }, { "b" : "7F02B124A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "C0ADBAD6F9A33944F2B3567C078EC472A1DAE98E" }, { "b" : "7F02ADE04000", "path" : "/usr/lib/x86_64-linux-gnu/libunwind.so.8", "elfType" : 3, "buildId" : "29F63E3B24F95E955C76E58354DB3294944105A1" }, { "b" : "7F02AD9A2000", "path" : "/usr/lib/x86_64-linux-gnu/libicui18n.so.55", "elfType" : 3, "buildId" : "F5BE69B92B7426F17A61E1A1308115A9FE73C51D" }, { "b" : "7F02AD60E000", "path" : "/usr/lib/x86_64-linux-gnu/libicuuc.so.55", "elfType" : 3, "buildId" : "463D8B610702D64AE0803C7DFCAA02CFB4C6477B" }, { "b" : "7F02AD39E000", "path" : "/lib/x86_64-linux-gnu/libpcre.so.3", "elfType" : 3, "buildId" : "390B2228E9A1071BB0BE285D77B6669CB37CE628" }, { "b" : "7F02AD17C000", "path" : "/lib/x86_64-linux-gnu/liblzma.so.5", "elfType" : 3, "buildId" : "15AED4855920E5A0FB8791B683EB88C7E1199260" }, { "b" : "7F02AB6C5000", "path" : "/usr/lib/x86_64-linux-gnu/libicudata.so.55", "elfType" : 3, "buildId" : "2CC92B3EC41116DD818F3DAF7DA7CBB410183DD6" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x12a7701]
mongod(+0xEA6559) [0x12a6559]
mongod(+0xEA6E81) [0x12a6e81]
libpthread.so.0(+0x11390) [0x7f02ae3fa390]
libc.so.6(gsignal+0x38) [0x7f02ae054428]
libc.so.6(abort+0x16A) [0x7f02ae05602a]
mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x114) [0x12208c4]
mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE+0xE0) [0x8b6680]
mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE+0x480) [0x8bdb30]
mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0xAC0) [0x8c09b0]
mongod(ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0xDC1) [0xd1f0b1]
mongod(+0x339325) [0x739325]
mongod(+0x33C4C0) [0x73c4c0]
mongod(main+0x732) [0x6f73a2]
libc.so.6(__libc_start_main+0xF0) [0x7f02ae03f830]
mongod(_start+0x29) [0x736e99]
----- END BACKTRACE -----
Aborted (core dumped)

Mind you, I am doing this all on a clone of the original so I wont ruin the original during this. If this fails, I need to know how to restore a copy of the DB to the same host. I have a bson/json dump from before this issue happened, however, I cannot run a restore because the database wont start up..not sure how to handle that. I have not run into any documentation or forum posts that list how to do that, which is confusing since I believe that its likely an issue others have had.

Comment by Matthew Williams [ 27/Feb/19 ]

– Unit juju-db.service has finished starting up.

– The start-up result is done.
Feb 27 19:31:24 hqosjuju mongod[1624]: 2019-02-27T19:31:24.981+0000 W CONTROL [main] No SSL certificate validation can be performed since no CA file has been provided; please specify an sslCAFile parameter
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] MongoDB starting : pid=1624 port=37017 dbpath=/var/lib/juju/db 64-bit host=hqosjuju
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] db version v3.2.15
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] git version: e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] allocator: tcmalloc
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] modules: none
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] build environment:
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] distarch: x86_64
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] target_arch: x86_64
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] options: { net: { ipv6: true, port: 37017, ssl:

Unknown macro: { PEMKeyFile}

}, replication: { oplogSizeMB: 1
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] WiredTiger (-31802) [1551295885:228871][1624:0x7fd915cdebc0], file:sizeStorer.wt, WT_SESSION.open_cursor: unable to read root page from file:sizeStorer.wt: WT_ERROR: non-spe
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_size_storer.cpp 67
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten]
0x12a7701 0x123d053 0x1222c52 0xffc703 0xfe518a 0xfdd474 0xeceb7e 0x73bd14 0x6f73a2 0x7fd9128c4830 0x736e99
----- BEGIN BACKTRACE -----
{"backtrace":[

Unknown macro: {"b"}

,{"b":"400000","o":"E3D053","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"E22C52","s":"_ZN5mongo17invaria
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x12a7701]
mongod(_ZN5mongo10logContextEPKc+0x183) [0x123d053]
mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x102) [0x1222c52]
mongod(ZN5mongo20WiredTigerSizeStorerC2EP15wt_connectionRKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1D3) [0xffc703]
mongod(ZN5mongo18WiredTigerKVEngineC1ERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_S8_mbbb+0xC4A) [0xfe518a]
mongod(+0xBDD474) [0xfdd474]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x3FE) [0xeceb7e]
mongod(+0x33BD14) [0x73bd14]
mongod(main+0x732) [0x6f73a2]
libc.so.6(__libc_start_main+0xF0) [0x7fd9128c4830]
mongod(_start+0x29) [0x736e99]
----- END BACKTRACE -----
Feb 27 19:31:25 hqosjuju mongod.37017[1624]: [initandlisten]

***aborting after invariant() failure
Feb 27 19:31:25 hqosjuju systemd[1]: juju-db.service: Main process exited, code=exited, status=14/n/a
Feb 27 19:31:25 hqosjuju systemd[1]: juju-db.service: Unit entered failed state.
Feb 27 19:31:25 hqosjuju systemd[1]: juju-db.service: Failed with result 'exit-code'.
Feb 27 19:31:25 hqosjuju systemd[1]: juju-db.service: Service hold-off time over, scheduling restart.
Feb 27 19:31:25 hqosjuju systemd[1]: Stopped juju state database.
– Subject: Unit juju-db.service has finished shutting down
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit juju-db.service has finished shutting down.
Feb 27 19:31:25 hqosjuju systemd[1]: juju-db.service: Start request repeated too quickly.
Feb 27 19:31:25 hqosjuju systemd[1]: Failed to start juju state database.
– Subject: Unit juju-db.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit juju-db.service has failed.

– The result is failed.

Getting this error at the moment. Going to try a mongod --repair to resolve it and research further.

Comment by Matthew Williams [ 27/Feb/19 ]

Thank you, testing now.

Comment by Eric Sedor [ 27/Feb/19 ]

Thanks for the detailed report matthew.williams. I've attached a repair attempt of the files you provided (Dated Feb 26 2019 04:12:06 PM GMT-0800). Please replace them in your $dbpath and let us know if it resolves the issue.

Sincerely,
Eric

Generated at Thu Feb 08 04:53:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.