[SERVER-29206] MongoS [NetworkInterfaceASIO-TaskExecutorPool-0-0] ExceededTimeLimit: Operation timed out Created: 15/May/17 Updated: 08/Jan/24 Resolved: 21/Jun/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | 3.2.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Wayne Egerer | Assignee: | Mira Carey |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We continue to receive the below errors on our mongos logs; which causes connection issues to our mongo cluster. This happens throughout the day every day. We have 4 mongos nodes. About to have 8 to see if that helps. All 4 will see the issue.
Below is our configuration file:
We have tried a variety of options none have helped:
When using any of the config options above; we see the errors occur pretty quickly.
|
| Comments |
| Comment by Mira Carey [ 21/Jun/17 ] | ||||||||||||||||||||||||
|
Now that we've picked a release for the backport of If you have any questions, feel free to re-open or open a new ticket as needed. | ||||||||||||||||||||||||
| Comment by Mira Carey [ 16/May/17 ] | ||||||||||||||||||||||||
|
You may need to go down a bit further, if 200 for max pool size is too large. Remember that it's effectively N cores * Pool size * Num_mongos' In a separate avenue, I'm exploring a rate limiter on new connection creation in | ||||||||||||||||||||||||
| Comment by Wayne Egerer [ 16/May/17 ] | ||||||||||||||||||||||||
|
I just tried this :
and
Still seeing the issue. I even waited over 5 minutes after restarting the mongos process to see if they would clear; but they did not. Saw these errors :
I was able to get this config in and things not report the errors so far; though will continue to monitor :
| ||||||||||||||||||||||||
| Comment by Mira Carey [ 15/May/17 ] | ||||||||||||||||||||||||
|
I'm summarizing my thoughts at the top, and more directly responding to your comment below.
Responding to your comment in order: Setting max pool size
Unfortunately, that's exactly the setting that you need in order to work around this problem. First, let me try to explain a little bit how setting the maximum size of the connection pool is likely to influence the running characteristics of your cluster, as well as where it is and is not dangerous. SuppositionsFor all of these scenarios, presume:
ScenariosNo max pool size setresponsive db + fast operations
responsive db + fast operations + small number of slow ops
responsive db + large number of slow ops
temporarily unresponsive db
Max pool size setresponsive db + fast operations
responsive db + fast operations + some slow ops
responsive db + many slow ops
temporarily unresponsive db
ConclusionSetting max pool size is unlikely to worsen throughput, as actually using more than 3-4k or so threads on a mongod concurrently rarely scales. Things like p99 and p95 latency can theoretically worsen, but only do so compared to a working system if max pool size is less than the desired number of long running operations (so that long running ops crowd out short ones) Further server development
I asked about your configuration under these various settings mostly to understand whether it was possible that you were dropping connections out due to periods of idleness. The efficacy of setting the host timeout very high seems to imply that you occasionally see no traffic at all to some hosts, followed by traffic spikes. In terms of long term mongos development, it most strongly indicates that we need back pressure from mongod, to prevent mongos from opening so many connections at once. It may be worth noting that the only form this backpressure will be able to take is forcing requests to queue on the mongos (much like setting the max pool size). I don't think that more tweaking of this configuration is likely to help you without looking at setting max pool size scalability of configuration driven solutions
I suspect that the stable configuration Antonios has found is simply one that always keeps around enough connections that it rarely if ever needs to make new ones. In particular, without setting the max pool size, you have enough application pressure to swamp your shards. While this is an issue that we can and should work around in software, by slowing the speed the rate of new connection creation in mongos, note that we'll still fail in periods of system unresponsiveness (as we'll wait longer, but we'll still create too many concurrent operations on a mongod). Long term solutions will have to explore backpressure and queueing outside of mongod. |