[SERVER-48509] "failed to create service entry worker thread" discards root cause exception message Created: 30/May/20 Updated: 29/Oct/23 Resolved: 07/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.3.6 |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Oleg Pudeyev (Inactive) | Assignee: | Andrew Chen (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Service arch 2020-06-29, Service arch 2020-07-13 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 23 | ||||||||||||||||||||
| Description |
|
I created a client in Ruby configured to establish 20,000 connections to local replica set servers (PSSA), as follows:
At about 6,000 connections per each of the servers, the servers start closing connections. Looking in the server log I see:
This message appears to be produced in src/mongo/transport/service_entry_point_utils.cpp:
The above code appears to discard the root cause of the error, making further troubleshooting impossible. As a MongoDB user I would like the server to provide error messages that indicate the cause of the problem, so that I can troubleshoot the problems. |
| Comments |
| Comment by G F [ 03/Oct/21 ] |
|
excuse me, may I ask for an information, is it thinkable that launchServiceWorkerThread is continuously spawning a thread for each command received from a client in a continuous session, at least compiled for windows not linux ? surely I misunderstood ? |
| Comment by Githook User [ 07/Jul/20 ] |
|
Author: {'name': 'Andrew Chen', 'email': 'a.chen@mongodb.com', 'username': 'AndrooTheChen'}Message: |
| Comment by Githook User [ 07/Jul/20 ] |
|
Author: {'name': 'Andrew Chen', 'email': 'a.chen@mongodb.com', 'username': 'AndrooTheChen'}Message: |
| Comment by Githook User [ 07/Jul/20 ] |
|
Author: {'name': 'Andrew Chen', 'email': 'a.chen@mongodb.com', 'username': 'AndrooTheChen'}Message: |
| Comment by Githook User [ 07/Jul/20 ] |
|
Author: {'name': 'Andrew Chen', 'email': 'a.chen@mongodb.com', 'username': 'AndrooTheChen'}Message: |
| Comment by Benjamin Caimano (Inactive) [ 01/Jun/20 ] |
|
oleg.pudeyev, there should also be this log statement which does specify the reason behind the failure. That said, the try-catch behavior does have a small possibility for us to swallow exceptions if they weren't thrown in those lines. I think this is separate from |
| Comment by Bruce Lucas (Inactive) [ 30/May/20 ] |
|
Looks like we'll be touch this log line soon as part of |
| Comment by Oleg Pudeyev (Inactive) [ 30/May/20 ] |
|
Test code: https://github.com/p-mongo/tests/blob/master/connect-limit/test.rb Not relevant to this ticket but connections for each server are established sequentially. The three servers are being connected to in parallel. Output: https://gist.github.com/p-mongo/aaf3a9351e46c2b06bf25f6d3b5c4ee1 It seems that when the total # of connections in the system is about 10,000, the server fails in the manner indicated. The connections could be split evenly or unevenly across the server processes. Sometimes a particular server does not fail when the total number of connections is 10,000 (possibly because it is not being connected to at that moment, when the next connection happens the number of connections has already dropped). |