[CDRIVER-541] Timeout used as specified by connectTimeoutMS will actually be less Created: 24/Feb/15 Updated: 19/Oct/16 Resolved: 01/Sep/16 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc, network |
| Affects Version/s: | 1.0.2 |
| Fix Version/s: | 1.5.0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Andrew Clayton | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
CentOS 6.x / x86-64 glibc 2.12 / gcc 4.4.7 |
||
| Description |
|
The value used as specified by the connectTimeoutMS option you specify will actually be less. In mongoc-client.c::mongoc_client_connect_tcp() we have 104 expire_at = bson_get_monotonic_time () + (connecttimeoutms * 1000L); That value is then eventually passed down to mongoc-socket.c::_mongoc_socket_wait() where we have 133 timeout = (int)((expire_at - bson_get_monotonic_time ()) / 1000L); Obviously at this point some amount of time will have already passed, so the time out will be less than what was specified. I'm seeing some quite large variations, with a connectTimeoutMS=100 I have seen the actual time out as low as 2. Also, seeing as the time out argument to poll(2) is simply a duration in milliseconds, as quoted from the man page "The timeout argument specifies the minimum number of milliseconds that Is there any reason to not just pass the connectTimeoutMS value as is? Cheers, |
| Comments |
| Comment by Githook User [ 01/Sep/16 ] | ||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Also restart the timer for each interface attempted, e.g. if the driver | ||||
| Comment by Andrew Clayton [ 30/Aug/16 ] | ||||
|
Ah, right you are. So yes, the next best thing would be to do the calculation at the last possible moment. | ||||
| Comment by A. Jesse Jiryu Davis [ 30/Aug/16 ] | ||||
|
Thanks Andrew, unfortunately this signature is now in our public API:
... so we can't change it, we have to convert from timeoutms to expire_at and back to a timeout in milliseconds. | ||||
| Comment by Andrew Clayton [ 30/Aug/16 ] | ||||
|
So it's been a while since I last looked at this. But I'll ask my last point again. Seeing as the timeout argument to poll(2) is simply a duration in milliseconds, as quoted from the man page "The timeout argument specifies the number of milliseconds that poll() So would the simple (and correct) fix be to just pass the specified timeout value straight to poll? Cheers, | ||||
| Comment by A. Jesse Jiryu Davis [ 30/Aug/16 ] | ||||
|
It appears this code has not changed. The variations you see must be the result of calculating expire_at before calling getaddrinfo. So the connect timeout applies to the getaddrinfo call plus the actual TCP connect. As you say, getaddrinfo's duration is unpredictable and some time can pass before we call connect(). I think the answer is to calculate expire_at after calling getaddrinfo. This is more like how other drivers work and probably matches the purpose of connect timeout: to distinguish between a slow server and an unavailable one, based on what you know about your network's typical latency. | ||||
| Comment by A. Jesse Jiryu Davis [ 22/Oct/15 ] | ||||
|
Need to check this over again, timeouts and the whole network layer of the driver have been overhauled twice since this bug was reported. |