[SERVER-17687] Fatal assertion after pthread_create fails with Resource temporarily unavailable Created: 23/Mar/15  Updated: 24/Mar/21  Resolved: 01/Apr/15

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitriy Selivanov Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File mongodb_error_log    
Operating System: ALL
Participants:

 Description   

I'm trying to backup my mongodb collection to amazon s3 using spark. So my cluster only read data from collection in may threads. After few hours mongodb server crashes. See log in attachment. If you need more details, please ask, I will provide additional information.



 Comments   
Comment by Ramon Fernandez Marina [ 24/Mar/21 ]

vasanth3g@gmail.com, please note it's always better to open a new ticket than commenting on an old, closed one.

That said, with the information above my initial assessment is that testmanager is opening a lot of connections, but those are not being closed properly thus leaving the socket in CLOSE_WAIT, and at some point exhausting all available file descriptors. At that point, mongod is unable to create new threads, and that's why you see the error message you're seeing.

If you believe this could be a bug in mongod feel free to open a new ticket, but https://developer.mongodb.com/community/forums/ sounds more appropriate.

Thanks,
Ramón.

Comment by Vasanth M.Vasanth [ 24/Mar/21 ]

Hi Ramon, 

we are running 32k connections on mongodb so facing below error on log file. After suggestions from mongo community to check pid_max and threads-max having little bit high only but number of sockets are opened high which means sockets are not closed yet like below.

  1. Centos 8.1
  2. Mongo version 3.6.17

cat /proc/sys/kernel/pid_max 4194304
cat /proc/sys/kernel/threads-max 94465

 

{{mongod 8955 root *366u IPv4 120213606 0t0 TCP testmanager:33445->node03:49816 (CLOSE_WAIT)
mongod 8955 root *367u IPv4 120213789 0t0 TCP testmanager:33445->node03:49860 (CLOSE_WAIT)
mongod 8955 root *368u IPv4 120402126 0t0 TCP testmanager:33445->node03:49864 (CLOSE_WAIT)
mongod 8955 root *369u IPv4 120437763 0t0 TCP testmanager:33445->node03:49866 (CLOSE_WAIT)}}

After some time socket descriptors reaching max limit. And mongo throwing an errror for thread creation.

 

{{2021-02-24T22:50:04.692+0000 I - [listener] pthread_create failed: Resource temporarily unavailable
2021-02-24T22:50:04.692+0000 W EXECUTOR [conn480782] Terminating session due to error: InternalError: failed to create service entry worker thread
2021-02-24T22:50:05.589+0000 I - [listener] pthread_create failed: Resource temporarily unavailable
2021-02-24T22:50:05.589+0000 W EXECUTOR [conn480783] Terminating session due to error: InternalError: failed to create service entry worker thread}}

https://jira.mongodb.org/browse/SERVER-17687

Below observation copied from above jira ticket.

If the issue is not the system-wide limit on the number of threads then the resource exhaustion is somewhere else. You'll need to investigate what resource is being exhausted (memory and number of file descriptors / sockets are the usual suspects) or simply lower the number of threads. If you're not using connection pooling you're probably running out of sockets (netstat -a | grep TIME_WAIT may help).

As per analysis, sockets descriptors getting exhausted and mongo thread creating is getting failed. Any suggestions why sockets are not getting closed or any workaround for this.

Thanks,
Vasanth

Comment by Ramon Fernandez Marina [ 01/Apr/15 ]

dselivanov, I see the following in the man page for pthread_create(3):

EAGAIN
Insufficient resources to create another thread, or a system-imposed limit on the number of threads was encountered. The latter case may occur in two ways: the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of process for a real user ID, was reached; or the kernel's system-wide limit on the number of threads, /proc/sys/kernel/threads-max, was reached.

If the issue is not the system-wide limit on the number of threads then the resource exhaustion is somewhere else. You'll need to investigate what resource is being exhausted (memory and number of file descriptors / sockets are the usual suspects) or simply lower the number of threads. If you're not using connection pooling you're probably running out of sockets (netstat -a | grep TIME_WAIT may help).

I don't see an evidence of a bug in the server, and since the SERVER project is for reporting bugs or feature suggestions for the MongoDB server I'm going to resolve this issue. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group.

Regards,
Ramón.

Comment by Dmitriy Selivanov [ 23/Mar/15 ]

982897
I believe this is quite big limit?

Comment by Ramon Fernandez Marina [ 23/Mar/15 ]

dselivanov, there's an error message in the log that says:

pthread_create: Resource temporarily unavailable

You may need to lower the number of threads, or increase some system limits. I think the error above means pthread_create returned EAGAIN; can you check what's the maximum number of threads allowed on this system?

$ cat /proc/sys/kernel/threads-max

Generated at Thu Feb 08 03:45:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.