[SERVER-9818] Problems when starting MongoDB with > 1023 FDs Created: 30/May/13  Updated: 11/Jul/16  Resolved: 10/Sep/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.3, 2.5.0
Fix Version/s: 2.5.3

Type: Bug Priority: Critical - P2
Reporter: Oliver John Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

RHEL 6.3 x86_64


Attachments: Text File mongo.log    
Issue Links:
Depends
depends on SERVER-2114 Don't use select timeouts for fast co... Closed
depends on SERVER-9876 cap the default initial oplog size Closed
Duplicate
Related
related to SERVER-17653 ERROR: socket XXX is higher than 1023... Closed
is related to SERVER-15389 Cannot start mongod when opening too ... Closed
Backwards Compatibility: Fully Compatible
Operating System: Linux
Steps To Reproduce:

Start mongodb instance with more than 1024 datafiles used for the local db.

Participants:

 Description   

When running a 3 node replSet with lots of data and currently and have more than 1500 datafiles, you got the following error messages at restart.
It looks like it has something to do with a FD_SETSIZE limitation to 1024. But ulimit -n is set correctly (ulimit -n 550000).

This bug might correspond to https://jira.mongodb.org/browse/SERVER-8515

Version 2.4.3

Thu May 30 15:57:27.912 [initandlisten] db version v2.4.3
Thu May 30 15:57:27.912 [initandlisten] git version: fe1743177a5ea03e91e0052fb5e2cb2945f6d95f
Thu May 30 15:57:27.912 [initandlisten] allocator: tcmalloc
Thu May 30 15:57:27.912 [initandlisten] options:

{ config: "/etc/mongod-longterm.conf", dbpath: "/data/mongo-longterm", fork: "true", logappend: "true", logpath: "/var/log/mongo/mongod-longterm.log", pidfilepath: "/var/run/mongodb/mongod-longterm.pid", port: 30001, replSet: "rs-longterm" }

Thu May 30 15:57:27.937 [initandlisten] journal dir=/data/mongo-longterm/journal
Thu May 30 15:57:27.938 [initandlisten] recover : no journal files present, no recovery needed
Thu May 30 15:57:43.596 [websvr] ERROR: socket 1201 is higher than 1023; not supported
Thu May 30 15:57:43.596 [initandlisten] ERROR: socket 1200 is higher than 1023; not supported
Thu May 30 15:57:43.597 [initandlisten] now exiting
Thu May 30 15:57:43.597 dbexit:
Thu May 30 15:57:43.597 [initandlisten] shutdown: going to close listening sockets...
Thu May 30 15:57:43.597 [initandlisten] closing listening socket: 1199
Thu May 30 15:57:43.597 [initandlisten] closing listening socket: 1200
Thu May 30 15:57:43.597 [initandlisten] closing listening socket: 1201

--------------------------------

Version 2.5.1-pre from github

Thu May 30 15:58:30.605 [initandlisten] db version v2.5.1-pre-
Thu May 30 15:58:30.605 [initandlisten] git version: 3b1257a5224d1e9df71ee1b4631ca32f2cf438f2
Thu May 30 15:58:30.605 [initandlisten] allocator: tcmalloc
Thu May 30 15:58:30.605 [initandlisten] options:

{ config: "/etc/mongod-longterm.conf", dbpath: "/data/mongo-longterm", fork: "true", logappend: "true", logpath: "/var/log/mongo/mongod-longterm.log", pidfilepath: "/var/run/mongodb/mongod-longterm.pid", port: 30001, replSet: "rs-longterm" }

Thu May 30 15:58:30.633 [initandlisten] journal dir=/data/mongo-longterm/journal
Thu May 30 15:58:30.633 [initandlisten] recover : no journal files present, no recovery needed
Thu May 30 15:58:30.923 [initandlisten] ERROR: socket 1201 is higher than 1023; not supported
Thu May 30 15:58:30.926 [initandlisten] now exiting
Thu May 30 15:58:30.926 [IndexRebuilder] assertion 11600 interrupted at shutdown ns:local.system.namespaces query:{}
Thu May 30 15:58:30.926 dbexit:
Thu May 30 15:58:30.926 [initandlisten] shutdown: going to close listening sockets...
Thu May 30 15:58:30.926 [initandlisten] closing listening socket: 1200
Thu May 30 15:58:30.926 [IndexRebuilder] warning: index rebuilding did not complete
Thu May 30 15:58:30.926 [initandlisten] closing listening socket: 1201



 Comments   
Comment by auto [ 19/Sep/13 ]

Author:

{u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-9818 fix race in jstests/slowNightly/httpinterface.js
Branch: master
https://github.com/mongodb/mongo/commit/9bb04443ccbcdd4f462cb2f07532c977a39bbf0d

Comment by auto [ 10/Sep/13 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-9818 allocate socket file descriptors before opening datafiles
Branch: master
https://github.com/mongodb/mongo/commit/e28400200538da57d4183c58925de511c2ad8e66

Comment by auto [ 04/Sep/13 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: Revert "SERVER-9818 allocate socket file descriptors before opening datafiles"

This reverts commit 607ae1384b26ebcb5e275a845f9806104852257f.
Branch: master
https://github.com/mongodb/mongo/commit/4d905c46a1cbc6f96bdd0522d0bd08f0c6c3f1c0

Comment by auto [ 04/Sep/13 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-9818 allocate socket file descriptors before opening datafiles
Branch: master
https://github.com/mongodb/mongo/commit/607ae1384b26ebcb5e275a845f9806104852257f

Comment by Shaun Verch [ 10/Jun/13 ]

Hi Oliver,

Even in standalone mode I believe that mongod will still try to open all these local database files. If that procedure does work for you, let us know, and if doesn't, also let us know and we can walk you through the process of using mongodump and mongorestore.

Thanks!
~Shaun Verch

Comment by Oliver John [ 10/Jun/13 ]

Hi Shaun,

First of all, thanks for your support! We're currently decreasing the oplogSize and try to start over with a fresh local-db as described in http://docs.mongodb.org/manual/tutorial/change-oplog-size/.

This should fix our problem for now, i think.

Thanks again and cheers,
Oliver

Comment by Shaun Verch [ 07/Jun/13 ]

Hi Tim,

By default, MongoDB preallocates the oplog as a function of disk size. You can change this using the --oplogSize option to mongod. See http://docs.mongodb.org/manual/reference/configuration-options/#oplogSize for more details on this. This unreasonable default for the oplog size will be fixed as SERVER-9876.

Have you inserted any data yet? Can you just wipe the data directories and start over? If not, let us know and we can walk you through the dump restore process.

Thanks,
~Shaun Verch

Comment by Tim Eggert [ 07/Jun/13 ]

Hi Shaun,

thank you very much for your detailed answer.
We do not store any big information in the local database. When starting MongoDB on the 50TB HDD, the local database files were created / allocated automatically.

Is it possible to circumvent this behaviour?

Thanks and kind regards,

Tim

Comment by Shaun Verch [ 07/Jun/13 ]

Hi Oliver,

It appears that the database that has many data files for your system is the local database. While you can store information here, mongodb also uses the local database for metadata and handles it differently from other databases. What are you storing in the local database?

One thing you can do to get around this issue is to run mongodump in "direct client" mode. This will interact with the data files directly without actually starting up the server:

./mongodump --dbpath <path to your data files> -c <collection name> -d local <name of dump destination>

Then, you can restore all your data into a different database using the mongorestore command.

With a mongod instance running:

./mongorestore --host <hostname> --port <port> -c <collection name> -d <db name> <name of collection file>

Without a mongod instance running:

./mongorestore --dbpath <path to your data files> -c <collection name> -d <db name> <name of collection file>

Quick example:

./mongodump --dbpath /data/db/ -c coll -d local dump/
./mongorestore --dbpath /data/newdb/ -c coll -d newdb dump/local/coll.bson

You can also adapt this to dump/restore the entire local database.

The deeper issue here will be solved when our call to select is replaced as part of SERVER-2114.

Thanks,
~Shaun Verch

Comment by Oliver John [ 06/Jun/13 ]

I attached a logfile with the full verbose output (-vvvvv).

Thanks,
Oliver

Comment by Shaun Verch [ 04/Jun/13 ]

Hi Oliver and Tim,

We are having some issues reproducing the problem in this ticket. Could you try running mongod with the -vvvvv option and post the output? That would give us more information about what files the database is opening on startup.

Thanks!
~Shaun Verch

Comment by Shaun Verch [ 03/Jun/13 ]

Hi Tim,

We are working on a reproduction of the issue. We'll post updates as we have them.

In the meantime, it would be helpful if you also posted some details about your environment.

In particular:

1. Number of data files you have
2. Log output from when you attempted to start up

Thanks,
~Shaun Verch

Comment by Tim Eggert [ 03/Jun/13 ]

Hi,

could you please give an update on the current ticket status. We are also experiencing the same issue in our production environments.

Thank you very much and kind regards,

Tim Eggert

Generated at Thu Feb 08 03:21:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.