[SERVER-3776] Server unable to accept more than ~1000 connections due to new ulimit setting in RHEL 6 Created: 07/Sep/11 Updated: 30/Mar/12 Resolved: 06/Jan/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Packaging |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Minor - P4 |
| Reporter: | John Feibusch | Assignee: | Richard Kreuter (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
RHEL 6, MongoDB 1.8.1 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
I had an issue where my Mongo server would not accept more than 1024 connections after I upgraded from RHEL 5.6 to RHEL 6. (I also upgraded Mongo from 1.8.0 to 1.8.1 at the same time). Apparently, RHEL 6 added a soft ulimit for processes of 1024, and Mongo threads count as processes. Mongo got kind of unstable when this happened - new connection attempts failed, but the connection count in serverStatus continued to increase. The server log showed connections accepted but did not indicate that they had failed. An strace of the server process showed the "clone" syscall failing with EAGAIN. Since I was starting Mongo with an upstart script, not a shell, using the ulimit command would not work. I fixed the problem by commenting out the "nproc" line in /etc/security/limits.d/90-nproc.conf . After that, the server worked OK. So, some questions: 1) Do you think that renaming the limits file with "mv /etc/security/limits.d/90-nproc.conf /etc/security/limits.d/90-nproc.conf.rpmsave" is a good idea? That would be much easier to do in my RPM post-install script than editing the file as I did for the first system. 2) Has this been reported before? If so, is there some writeup you can point me to? 3) Can we request an improvement to Mongo so that it handles this situation a little better? It would have saved me an hour of troubleshooting if Mongo put a useful message in the server log when the "clone" syscall failed. Also it would be good to document that the ulimit for max processes must be greater than the largest number of threads that could be connected to the database, at least on this platform. Thanks! |
| Comments |
| Comment by Richard Kreuter (Inactive) [ 06/Jan/12 ] |
|
mongodb can't do a whole lot if the operating system's settings are tuned too low. Version 2.1 and greater now issue a warning for low resouce limits. |
| Comment by Richard Kreuter (Inactive) [ 01/Dec/11 ] |
|
Logging warnings about bad rlimits might help. |
| Comment by Richard Kreuter (Inactive) [ 01/Dec/11 ] |
|
This turns out not to be a packaging issue, but a dubiously low distro setting that absolutely needs changing on any host that runs MongoDB. |
| Comment by Scott D [ 08/Sep/11 ] |
|
Another way around this is to use sudo. Add a rule to allow your mongo user id to run any command you like as itself with no password, then the ulimits will be set to your defaults in limits.conf |
| Comment by John Feibusch [ 08/Sep/11 ] |
|
I have attached the upstart file and the limits.conf file. I can't provide the /proc/<pid>/limits file, since I have renamed the limits.conf file on all of our MongoDB servers. |
| Comment by John Feibusch [ 08/Sep/11 ] |
|
This file is part of the base install of RHEL 6. |
| Comment by Richard Kreuter (Inactive) [ 08/Sep/11 ] |
|
Yes, could you attach those 3 files anyway, so I can debug the settings? |
| Comment by John Feibusch [ 08/Sep/11 ] |
|
Check out the doc for runuser in RHEL 6. It has changed. The man page in RHEL 6 says: In RHEL 5.6 it says: I don't know if the actual behavior has changed or if this is just a doc correction, but the current version seems to be behaving correctly according to the current documentation. Do you still want the files you mentioned? |
| Comment by Richard Kreuter (Inactive) [ 08/Sep/11 ] |
|
That's weird, since the available documentation for runuser suggests that the point of it is not to use PAM. Could you attach (1) the upstart config file you're using to start mongod, (2) your /etc/security/limits.d/90-nproc.conf, and (3) the contents of /proc/<pid>/limits, where <pid> is the pid of the mongod you've started via your upstart config? Thanks in advance. |
| Comment by John Feibusch [ 07/Sep/11 ] |
|
Richard's idea sounded very good - I was slightly embarrassed for not having thought of it. But it doesn't seem to work. I start the Mongo server with a "runuser" command so that the server will run as the mongo user rather than as root. I think the upstart script in the 10gen package does the same thing. Unfortunately, "runuser" invokes the pam_limits module, which overrides any limits you set within the upstart script. I could run the "ulimit" command within the runuser command, but then it is running as a non-privileged user so the limit can't exceed the hard limit. So I would first have to determine what the hard limit is and then set the soft limit to that value. That's a bit more logic than I'd like to have in an upstart script. |
| Comment by Richard Kreuter (Inactive) [ 07/Sep/11 ] |
|
It should be possible (for sufficiently recent upstart versions, anyway) to increase the nproc limit in the upstart job itself. If you've a spare RHEL 6 system around to try this on (I haven't), could you add limit nproc 16000 16000 to the upstart file and see if that helps? This way, you don't need to touch the system-wide defaults in /etc/security. |