[SERVER-24812] Thread starvation even with proper ulimits Created: 27/Jun/16  Updated: 14/Jul/16  Resolved: 27/Jun/16

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Travis Thieman Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

3.2.5 WT


Participants:

 Description   

Hi Mongo,

We had a bit of a problem this weekend with one of our primaries. The primary was unable to create new threads to handle requests, which effectively took it down. However, it kept responding to the rest of its replica set (presumably on an older, long-lived thread) so no automated failover took place. During the failure, the primary's log is filled with these two lines, repeated ad infinitum:

2016-06-25T23:49:29.455+0000 I NETWORK  [initandlisten] failed to create thread after accepting new connection, closing connection
2016-06-25T23:49:29.457+0000 I NETWORK  [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable

My guess is that this problem is usually operator error caused by misconfigured resource limits. I think ours were fine, though, so I'm a bit puzzled.

Here's the output from ulimit -a on this system:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515188
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515188
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And /proc/sys/kernel/threads-max is set to 1030376. We restarted the primary once we detected the problem, and the mongod seems to hover between 1K and 5K threads (as measured by ps -eLf | grep mongo | wc -l) under our usual load patterns. I'm not sure how we could have exceeded the limits I'm seeing here, so I think I might be misunderstanding something.

Two questions for you:

1. What limits might be being exceeded to cause these pthread_create errors? Perhaps I am interpreting ulimit incorrectly.
2. Any suggestions for system metrics we might monitor to detect and prevent this sort of problem going forward?

Thanks much,
Travis



 Comments   
Comment by Ramon Fernandez Marina [ 27/Jun/16 ]

Thanks for your report travis@gamechanger.io. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Generated at Thu Feb 08 04:07:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.