[SERVER-37752] possible bug in the server (ignoring timeout, using hardcoded timeout,..) ? Created: 25/Oct/18 Updated: 01/Nov/18 Resolved: 01/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code, Networking, Replication, Sharding |
| Affects Version/s: | 3.6.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael Adamcik | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | i cant imagine. its a very big project. |
| Participants: |
| Description |
|
hello together, it seems its some hardcoded 20 second timeout.. i got to this conclusion through my scripts.. they end up like this:
and that works exactly like this:
so the difference between the update call and the script exit is everytime 20 seconds.. it happens not only with addtoset. i already connect to the the server with very high timeout values (for exact this debug purpose):
so the question is, where does this timeout come from and what can i do, except from try/catch to prevent these errors? im using MongoDB 3.6.5, and the Perl Driver 2.0.1 at Debian Stable. kind regards, |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 01/Nov/18 ] |
|
Hello Mike, I'm glad to hear it. Please note that these timings are longer than what most of our customers use so you may wish to investigate you infrastructure to determine if there are any upgrades that you can make to improve performance. As this is not a bug in MongoDB, I will close this SERVER ticket. Thank you, Danny |
| Comment by Michael Adamcik [ 01/Nov/18 ] |
|
Hello Danny, Im testing now with a second mongos instance having adjusted parameters:
Seems to solve the issue so far.
Kind regards, Mike |
| Comment by Danny Hatcher (Inactive) [ 29/Oct/18 ] |
|
Hello Michael, Yes it appears that this is not an issue with the Perl driver and is instead a problem with overloaded `mongod` instances. As you pointed out in your original description that the timeout was always 20 seconds, I would start with shardingTaskExecutorPoolRefreshTimeoutMS. If you change that to 30 seconds and then start seeing the errors in your application at a 30 second interval, that should prove it. Can you test that change now? Thanks, Danny |
| Comment by Michael Adamcik [ 27/Oct/18 ] |
|
Hello again, a little update: i know my mongod instances experience some heavy-load situations, where the storage becomes the bottleneck (high iowait)..maybe in such situations the mongod processes become unable to accepting new connections. so, after seeing the log, im trying now some options from https://docs.mongodb.com/v3.6/reference/parameters/#sharding-parameters (connection count, idle time,...) if they help, its not a bug i guess....
your help and support is still highly appreciated! Kind regards, michael |
| Comment by Michael Adamcik [ 26/Oct/18 ] |
|
Hello Danny, it just happend again, i uploaded the log of the mongos instance. The fun begins at 2018-10-26T20:53:52 Thank you, Michael |
| Comment by Danny Hatcher (Inactive) [ 26/Oct/18 ] |
|
Hello Michael, Unfortunately, I am unable to reproduce this problem on my own environment. Are you able to reproduce this issue consistently with a test script or is it something that you are only occasionally seeing as part of your main scripts? If you are connecting via a mongos node, it is possible that the error is actually at that level and is simply bubbling up to the driver. As you mentioned, I believe the mongos and mongod logs will be the key to resolving this problem. Can you upload a copy of the problem mongos and Primary mongod logs if it happens again? Thank you, Danny |
| Comment by Michael Adamcik [ 25/Oct/18 ] |
|
sorry for no logs.. i will add server loging to my mongos and mongod instances, maybe they will give some useful information... could take a week.. these errors dont appear often.. |