[SERVER-33178] Server Crashes on Startup Created: 07/Feb/18  Updated: 27/Oct/23  Resolved: 09/Feb/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.10, 3.4.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chris Spencer Assignee: Mark Agarunov
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I recently upgraded an Ubuntu 14 server, running Mongo 3.4.10, to Ubuntu 16. I re-installed the identical version of mongodb-org from your PPA, but now when I start mongo, it's doesn't respond to any connections, and I see this error in `/var/log/mongodb/mongodb.log`:

    2018-02-06T18:40:15.680+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] MongoDB starting : pid=15925 port=27017 dbpath=/var/lib/mongodb 64-bit host=proddb1
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] db version v3.4.11
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] git version: 34f5bec2c9d827d71828fe858167f89a28b29a2a
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] allocator: tcmalloc
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] modules: none
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] build environment:
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten]     distarch: x86_64
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten]     target_arch: x86_64
    2018-02-06T18:40:15.684+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongodb.conf", net: { bindIp: "127.0.0.1" }, storage: { dbPath: "/var/lib/mongodb", engine: "wiredTiger", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongodb.log" } }
    2018-02-06T18:40:15.684+0000 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
    2018-02-06T18:40:15.709+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
    2018-02-06T18:40:15.709+0000 I STORAGE  [initandlisten] 
    2018-02-06T18:40:15.709+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
    2018-02-06T18:40:15.709+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
    2018-02-06T18:40:15.709+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=15574M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
    2018-02-06T18:40:16.468+0000 I STORAGE  [initandlisten] WiredTiger message [1517942416:468088][15925:0x7f4d31632d00], txn-recover: Main recovery loop: starting at 12168/128
    2018-02-06T18:40:16.468+0000 I STORAGE  [initandlisten] WiredTiger message [1517942416:468689][15925:0x7f4d31632d00], txn-recover: Recovering log 12168 through 12169
    2018-02-06T18:40:16.528+0000 I STORAGE  [initandlisten] WiredTiger message [1517942416:528271][15925:0x7f4d31632d00], txn-recover: Recovering log 12169 through 12169
    2018-02-06T18:40:17.875+0000 E STORAGE  [initandlisten] WiredTiger error (24) [1517942417:875171][15925:0x7f4d31632d00], file:collection-43442-4253276309270751377.wt, WT_SESSION.open_cursor: /var/lib/mongodb/collection-43442-4253276309270751377.wt: handle-open: open: Too many open files
    2018-02-06T18:40:17.875+0000 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: 24: Too many open files at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 95
    2018-02-06T18:40:17.875+0000 I -        [initandlisten] 
 
    ***aborting after invariant() failure

Googling, I found some people suggesting I need to increase my limits, so I tried modifying sudo nano /etc/security/limits.conf by adding:

  • soft fsize unlimited
  • hard fsize unlimited
  • soft cpu unlimited
  • hard cpu unlimited
  • soft as unlimited
  • hard as unlimited
  • soft nofile 64000
  • hard nofile 64000
  • soft rss unlimited
  • hard rss unlimited
  • soft nproc 64000
  • hard nproc 64000

I also added `session required pam_limits.so` to `/etc/pam.d/common-session` and `/etc/pam.d/common-session-noninteractive`, and then rebooted. However, I get the same error.

Think it might be data corruption, I tried running `sudo -u mongodb mongod --repair --dbpath /var/lib/mongodb` but after several minutes of rebuilding indexes, that too failed with a similar crash report:

    2018-02-07T21:55:46.255+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-43442-4253276309270751377. Not salvaging.
    2018-02-07T21:55:46.757+0000 E STORAGE  [initandlisten] WiredTiger error (24) [1518040546:757829][1803:0x7f90823bcd00], WT_SESSION.create: /var/lib/mongodb/: directory-sync: open: Too many open files
    2018-02-07T21:55:47.025+0000 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195
    2018-02-07T21:55:47.025+0000 I -        [initandlisten] 
 
    ***aborting after invariant() failure
 
 
    2018-02-07T21:55:47.046+0000 F -        [initandlisten] Got signal: 6 (Aborted).



 Comments   
Comment by Mark Agarunov [ 09/Feb/18 ]

Hello chrisspen,

Thank you for the report and I'm glad this is working for you now. Generally we recommend setting the ulimit settings on the higher end, but this is dependent on your workload.

Thanks,
Mark

Comment by Chris Spencer [ 08/Feb/18 ]

I re-ran the repair command after my changes limits.conf, and that seems to have fixed the problem.

It seems that the limits in the default daemon are insufficient for a lot of larger databases.

Generated at Thu Feb 08 04:32:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.