[SERVER-37712] Not able restart MongoDB service after stopped Created: 23/Oct/18  Updated: 14/Nov/18  Resolved: 14/Nov/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Waqas Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.lock     File WiredTiger.turtle     File WiredTiger.wt     File WiredTiger.wt.1     File WiredTigerLAS.wt     Text File log.txt     File sizeStorer.wt     File storage.bson     File wiredtiger.tar    
Operating System: ALL
Participants:

 Description   

We tried to re-start mongo db service and it is not restarting which seems like the WiredTiger.wt file got corrupted which we are not able to recover. Below is the error stack;

 

root@ip-172-31-26-92:/data# mongod
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] MongoDB starting : pid=23951 port=27017 dbpath=/data/db 64-bit host=ip-172-31-26-92
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] db version v3.2.12
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] git version: ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] allocator: tcmalloc
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] modules: none
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] build environment:
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] distarch: x86_64
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] target_arch: x86_64
2018-10-23T10:31:07.775+0000 I CONTROL [initandlisten] options: {}
2018-10-23T10:31:07.800+0000 I - [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-10-23T10:31:07.800+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-10-23T10:31:07.806+0000 E STORAGE [initandlisten] WiredTiger (-31802) [1540290667:806369][23951:0x7faaa92bdc80], file:WiredTiger.wt, connection: /data/db/WiredTiger.wt: handle-read: pread: failed to read 4096 bytes at offset 45056: WT_ERROR: non-specific WiredTiger error
2018-10-23T10:31:07.806+0000 I - [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error
2018-10-23T10:31:07.806+0000 I STORAGE [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-10-23T10:31:07.806+0000 I CONTROL [initandlisten] dbexit: rc: 100



 Comments   
Comment by Danny Hatcher (Inactive) [ 06/Nov/18 ]

Hello Waqas,

This error is related to the journal. I recommend deleting the files in the journal folder starting with WiredTigerLog.0000000040 and anything after that file. Then try to start the system again. Please note that this may cause data loss if there is data present in the journal that does not exist on disk.

Thank you,

Danny

Comment by Waqas [ 06/Nov/18 ]

Now error is i cant get db names

2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] MongoDB starting : pid=12764 port=27017 dbpath=/data/db 64-bit host=ip-172-31-26-92
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] db version v3.2.12
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] git version: ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] allocator: tcmalloc
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] modules: none
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] build environment:
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] distmod: ubuntu1604
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] distarch: x86_64
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] target_arch: x86_64
 2018-11-01T11:40:17.495+0000 I CONTROL [initandlisten] options: {}
 2018-11-01T11:40:17.520+0000 I - [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
 2018-11-01T11:40:17.520+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
 2018-11-01T11:40:17.555+0000 E STORAGE [initandlisten] WiredTiger (2) [1541072417:555836][12764:0x7f29d9b36c80], txn-recover: /data/db/journal/WiredTigerLog.0000000040: handle-open: open: No such file or directory
 2018-11-01T11:40:17.555+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:555890][12764:0x7f29d9b36c80], txn-recover: WiredTiger is unable to read the recovery log.
 2018-11-01T11:40:17.555+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:555910][12764:0x7f29d9b36c80], txn-recover: This may be due to the log files being encrypted, being from an older version or due to corruption on disk
 2018-11-01T11:40:17.555+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:555917][12764:0x7f29d9b36c80], txn-recover: You should confirm that you have opened the database with the correct options including all encryption and compression options
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (2) [1541072417:556059][12764:0x7f29d9b36c80], WT_CURSOR.search: /data/db/journal/WiredTigerLog.0000000040: handle-open: open: No such file or directory
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556086][12764:0x7f29d9b36c80], WT_CURSOR.search: WiredTiger is unable to read the recovery log.
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556095][12764:0x7f29d9b36c80], WT_CURSOR.search: This may be due to the log files being encrypted, being from an older version or due to corruption on disk
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556102][12764:0x7f29d9b36c80], WT_CURSOR.search: You should confirm that you have opened the database with the correct options including all encryption and compression options
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (2) [1541072417:556536][12764:0x7f29d9b36c80], txn-recover: /data/db/journal/WiredTigerLog.0000000040: handle-open: open: No such file or directory
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556552][12764:0x7f29d9b36c80], txn-recover: WiredTiger is unable to read the recovery log.
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556562][12764:0x7f29d9b36c80], txn-recover: This may be due to the log files being encrypted, being from an older version or due to corruption on disk
 2018-11-01T11:40:17.556+0000 E STORAGE [initandlisten] WiredTiger (0) [1541072417:556570][12764:0x7f29d9b36c80], txn-recover: You should confirm that you have opened the database with the correct options including all encryption and compression options
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten]
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten]
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten]
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
 2018-11-01T11:40:17.577+0000 I CONTROL [initandlisten]
 2018-11-01T11:40:17.578+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
 2018-11-01T11:40:17.578+0000 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
 2018-11-01T11:40:17.578+0000 I NETWORK [initandlisten] waiting for connections on port 27017

Comment by Waqas [ 31/Oct/18 ]

Dear Daniel,

Thanks for your effort, the file actually worked and it started the mongo db successfully. However, the dbs are their but their collections are not showing up (only log)

My colleague is going to restore his backup tar file of the data and will try again. If anything comes up, I will update you.

Thanks and regards

Comment by Waqas [ 28/Oct/18 ]

Thanks Daniel,

I will try this out and let you know

Regards,

Comment by Danny Hatcher (Inactive) [ 26/Oct/18 ]

Hello Waqas,

I've attached a repair attempt of the files you provided. Please extract these files, replace them in your $dbpath, and let us know if it resolves the issue.

wiredtiger.tar

Thank you,

Danny

Comment by Waqas [ 25/Oct/18 ]

Thanks for your response Daniel.

Unfortunately no. Please let me know if there is any way we can use the data files without the WiredTiger file 

Regards,

Waqas

Comment by Danny Hatcher (Inactive) [ 24/Oct/18 ]

Hello Waqas,

Do you have the mongod logs from before the restart? It may tell us why the process didn't cleanly shut down.

Comment by Waqas [ 23/Oct/18 ]

Dear Daniel,

 

 

We restarted the process by using the following command while it was running

sudo service mongodb restart

We thought the above command would cleanly shutdown and restart the process but it didn't. Just for information, at the time of restarting, may be another guy was connected to the mongo db service through another machine. We did not use the kill process command, neither does the process crashed itself nor does the hardware caused any crash.

Unfortunately we don't have a replica of WiredTiger.wt file. However, after unsuccessful launch of service, we compressed the database folder using TAR not mongo dump. If necessary, we can share the TAR file as well.

We definitely plan to upgrade the MongoDB once this issue is resolved

Thanks and regards,

Waqas

Comment by Danny Hatcher (Inactive) [ 23/Oct/18 ]

Hello,

Can you please describe a little more detail as to how you restarted the process? Did you cleanly shutdown the process and then attempt to bring it back up? Did you run a kill -9? Did the process crash? Did the hardware? The context will most likely determine the cause.

Regardless, if you have a replica set you should be able to initial sync this node and continue running.

Please note that MongoDB 3.2 has reached end-of-life as of September so we strongly recommend upgrading to 3.4, 3.6, or 4.0 as soon as feasible.

Generated at Thu Feb 08 04:46:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.