[SERVER-31344] Process limit not handled gracefully Created: 30/Sep/17  Updated: 27/Oct/23  Resolved: 17/Oct/17

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Siarhiej Jaskievi? Assignee: Andrew Morrow (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Set ulimit -u to 1k. Added some requests to mongodb, opening new connections (near 1k)
For some time mongod was working as expected, it was throwing errors like:

2017-09-30T11:13:48.152+0300 I NETWORK  [thread1] connection accepted from [XXX]:47920 #1881 (930 connections now open)
2017-09-30T11:13:48.152+0300 I -        [thread1] pthread_create failed: Resource temporarily unavailable
2017-09-30T11:13:48.152+0300 I -        [thread1] failed to create service entry worker thread for [XXX]:47920
 
But suddenly it throws such error:
 
2017-09-30T11:13:48.191+0300 I REPL     [rsBackgroundSync] sync source candidate: YYYY
2017-09-30T11:13:48.191+0300 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to YYYY
2017-09-30T11:13:48.191+0300 F ASIO     [NetworkInterfaceASIO-RS-0] Uncaught exception in NetworkInterfaceASIO IO worker thread of type: UnknownError: Caught std::exception of type std::system_error: thread: Resource temporarily unavailable
2017-09-30T11:13:48.191+0300 I -        [NetworkInterfaceASIO-RS-0] Fatal Assertion 28820 at src/mongo/executor/network_interface_asio.cpp 168
2017-09-30T11:13:48.191+0300 I -        [NetworkInterfaceASIO-RS-0]
 
***aborting after fassert() failure
 
 
2017-09-30T11:13:48.195+0300 F -        [NetworkInterfaceASIO-RS-0] Got signal: 6 (Aborted).
 
 0x56467a204f31 0x56467a204029 0x56467a20450d 0x7faa6adc1cb0 0x7faa6aa27035 0x7faa6aa2a79b 0x5646794c7b59 0x564679f80f3a 0x56467ac73790 0x7faa6adb9e9a 0x7faa6aae72ed
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"564678CCC000","o":"1538F31","s":"_ZN5mongo15printStackTraceERSo"},{"b":"564678CCC000","o":"1538029"},{"b":"564678CCC000","o":"153850D"},{"b":"7FAA6ADB2000","o":"FCB0"},{"b":"7FAA6A9F1000","o":"36035","s":"gsignal"},{"b":"7FAA6A9F1000","o":"3979B","s":"abort"},{"b":"564678CCC000","o":"7FBB59","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"564678CCC000","o":"12B4F3A"},{"b":"564678CCC000","o":"1FA7790"},{"b":"7FAA6ADB2000","o":"7E9A"},{"b":"7FAA6A9F1000","o":"F62ED","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.68-33", "version" : "#1 SMP Mon May 15 15:50:23 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "564678CCC000", "elfType" : 3, "buildId" : "7F4B6253104ED1F56A3EBE57764D712224D2EE74" }, { "b" : "7FFF0C18E000", "elfType" : 3, "buildId" : "EAE1311A713DED3C92BFD94265DF468A7189F7CD" }, { "b" : "7FAA6B6E5000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "99255CAB5455CB9EDFA0270CDAA6B6A7BBEF2E1B" }, { "b" : "7FAA6B4E1000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "A51A5921F4E05E4D20B165D398BA4D563960DA9A" }, { "b" : "7FAA6B1E5000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "6D3D633C88F7E9835D180ACE648CEDB21C8021B7" }, { "b" : "7FAA6AFCF000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "ECF322A96E26633C5D10F18215170DD4395AF82C" }, { "b" : "7FAA6ADB2000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9B1F69F5DC3A6820BB3CA4B2DB147ABAA486A41A" }, { "b" : "7FAA6A9F1000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "68FC0E76A868E47807E3604B02D8BAA580A4E2CB" }, { "b" : "7FAA6B8ED000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2B91CD40CE35626DAB827FEEE08F671253FA7B88" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x56467a204f31]
 mongod(+0x1538029) [0x56467a204029]
 mongod(+0x153850D) [0x56467a20450d]
 libpthread.so.0(+0xFCB0) [0x7faa6adc1cb0]
 libc.so.6(gsignal+0x35) [0x7faa6aa27035]
 libc.so.6(abort+0x17B) [0x7faa6aa2a79b]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5646794c7b59]
 mongod(+0x12B4F3A) [0x564679f80f3a]
 mongod(+0x1FA7790) [0x56467ac73790]
 libpthread.so.0(+0x7E9A) [0x7faa6adb9e9a]
 libc.so.6(clone+0x6D) [0x7faa6aae72ed]
-----  END BACKTRACE  -----

This was happening with every replica in replicaset (because of constant number of requests), replicaset became unstable.

Participants:

 Description   

Hi!
We've got problem with mongod with unstable behaviour, when replicaset had constantly rebooting instances and almost no master elected.



 Comments   
Comment by Andrew Morrow (Inactive) [ 17/Oct/17 ]

Hi yasek - I'm closing this ticket since it doesn't seem to have any additional activity. Please feel free to re-open or add additional comments if there is any additional help I can provide.

Comment by Andrew Morrow (Inactive) [ 09/Oct/17 ]

yasek - I'd recommend over-provisioning w.r.t. to any ulimits. We do not currently attempt to gracefully degrade when we are close to violating them. We treat ulimits as quotas that the user has chosen to enforce on us, and consider a violation of those limits as an indication that the system is misbehaving and should be terminated by the OS as the next level of error handling. Attempting to set the ulimits at or just above your expected capacity means that transient events could push you into failure. I would advise dropping ulimit restrictions entirely or setting them to a constant multiple of your expected maximum. To answer your specific question, you are not safe to handle 64k connections with a ulimit set at 64k processes, as mongodb can and does make use of background threads. Please let me know if there is any additional information I can provide to help you set appropriate values.

Comment by Kelsey Schubert [ 06/Oct/17 ]

Hi yasek,

If you expect that your that your workload will generate close to 64k connections, I would recommend increasing your limits to ensure that mongod can continue to access required resources under load. I've modified the ticket to summary to describe work to enable mongod to more gracefully approach these limits and marked it to be considered by our Platforms Team - please continue to watch for updates.

Kind regards,
Kelsey

Comment by Siarhiej Jaskievi? [ 03/Oct/17 ]

anonymous.user, any suggestions?

Comment by Siarhiej Jaskievi? [ 30/Sep/17 ]

Yes, I realize, that increasing ulimits could resolve this issue on such workload.
Actually, we had greater limits and heavier workload.
Is there guarantee that having ulimit set to 64k, mongod will be still working after allocating 64k connections?

Comment by Kelsey Schubert [ 30/Sep/17 ]

Hi yasek,

Thank you for the report. mongod requires ulimits to be appropriately set for its workload. Please increase your settings to our recommended values and reevaluate the performance of your replica set to confirm that the issue has been resolved.

Kind regards,
Kelsey

Generated at Thu Feb 08 04:26:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.