[SERVER-29563] mongod server fails to start and hangs Created: 12/Jun/17  Updated: 20/Sep/17  Resolved: 28/Aug/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Abhishek Vaid Assignee: Mark Agarunov
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar.gz     Text File mongod.log    
Operating System: ALL
Participants:

 Description   

Few days back my mongod crashed. It's a single node machine storing about 1 TB of data. The machine is ubuntu 16.04 and mongodb version is 3.4.2. The directory has all files intact.

When I try to start mongod again, I see these log entries:

2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] MongoDB starting : pid=6017 port=28018 dbpath=/data-raid/mongo/data 64-bit host=mongo-vm08
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] db version v3.4.2
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] modules: none
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] build environment:
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-06-12T12:36:21.649+0000 I CONTROL  [initandlisten] options: { net: { port: 28018 }, storage: { dbPath: "/data-raid/mongo/data", directoryPerDB: true }, systemLog: { destination: "file", path: "/data-raid/mongo/logs/mongodb.log" } }
2017-06-12T12:36:21.707+0000 I -        [initandlisten] Detected data files in /data-raid/mongo/data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-06-12T12:36:21.707+0000 I STORAGE  [initandlisten] 
2017-06-12T12:36:21.707+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-06-12T12:36:21.707+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-06-12T12:36:21.707+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7507M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),

  • This is it. Nothing else happens. The mongod stays like this for hours. When I use --repair option, this behavior persists.
  • The reason I'm using different port other than 27017 is because another mongod is running.


 Comments   
Comment by Kelsey Schubert [ 28/Aug/17 ]

Hi vaidabhishek,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Thomas

Comment by Mark Agarunov [ 15/Jun/17 ]

Hello vaidabhishek,

Looking at the output you've provided, it does appear that this hang is due to recovery running:

2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.

Has the workaround by disabling time synchroniztion or leaving mongod running to finish the recovery had any success?

Thanks,
Mark

Comment by Abhishek Vaid [ 15/Jun/17 ]

Any updates on this ?

Comment by Abhishek Vaid [ 13/Jun/17 ]

Mark,

I upgraded mongo to 3.4.4 and tried to bring up mongod again. Here is how logs are looking. I'm running it in a screen session with log output coming on stdout. I've copied this output after 10 minutes of server restart. As you can see, nothing seems to be happening. *The first line is the command I ran. *

frrole@mongo-vm08:/usr/bin$ sudo mongod --directoryperdb  --dbpath /data-raid/mongo/data --port 28018  -vvvvv --repair
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] MongoDB starting : pid=29581 port=28018 dbpath=/data-raid/mongo/data 64-bit host=mongo-vm08
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] db version v3.4.4
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] git version: 888390515874a9debd1b6c5d36559ca86b44babd
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] modules: none
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten] build environment:
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-06-13T19:28:42.770+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-06-13T19:28:42.771+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-06-13T19:28:42.771+0000 I CONTROL  [initandlisten] options: { net: { port: 28018 }, repair: true, storage: { dbPath: "/data-raid/mongo/data", directoryPerDB: true }, systemLog: { verbosity: 5 } }
2017-06-13T19:28:42.771+0000 D NETWORK  [initandlisten] fd limit hard:65536 soft:1024 max conn: 819
2017-06-13T19:28:42.828+0000 I -        [initandlisten] Detected data files in /data-raid/mongo/data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten]
2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2017-06-13T19:28:42.828+0000 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=7507M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),

Comment by Mark Agarunov [ 13/Jun/17 ]

Hello vaidabhishek,

Thank you for providing these files. Looking over the diagnostic data and the description of the behavior you've given, the hang during startup may be due to the recovery process. It seems that a checkpoint hadn't completed in a long time before mongod shut down, and this is now causing the delay. Unfortunately the version of MongoDB you are using (3.4.2) does not log the recovery process, however newer versions, as of 3.4.3, do. My recommendation would be to upgrade if possible and attempt to start mongod with verbose logging enabled again and provide us with the logs that are generated. This should let us know exactly what is happening during the recovery.

Additionally, the reason a checkpoint hadn't completed before shutdown may be due to a time sync issue if you are using Hyper-V. Until there is a fix for this in a newer release, you may be able to work around the issue by disabling time synchronization.

Thanks,
Mark

Comment by Abhishek Vaid [ 13/Jun/17 ]

These are the files you asked. I started my mongod with following command:

sudo mongod --directoryperdb --dbpath /data-raid/mongo/data --port 28018 --logpath /data-raid/mongo/logs/mongodb.log -vvvvv

Comment by Mark Agarunov [ 12/Jun/17 ]

Hello vaidabhishek,

Thank you for the report. To get a better idea of what may be causing the behavior you're seeing, I'd like to request the following:

  • The complete logs from mongod with the verbosity set to 5
  • Please archive and upload the $dbpath/diagnostic.data directory.

This should give us some more information to better investigate the issue.

Thanks,
Mark

Generated at Thu Feb 08 04:21:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.