[CDRIVER-541] Timeout used as specified by connectTimeoutMS will actually be less Created: 24/Feb/15  Updated: 19/Oct/16  Resolved: 01/Sep/16

Status: Closed
Project: C Driver
Component/s: libmongoc, network
Affects Version/s: 1.0.2
Fix Version/s: 1.5.0

Type: Bug Priority: Minor - P4
Reporter: Andrew Clayton Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.x / x86-64

glibc 2.12 / gcc 4.4.7



 Description   

The value used as specified by the connectTimeoutMS option you specify will actually be less.

In mongoc-client.c::mongoc_client_connect_tcp() we have

104 expire_at = bson_get_monotonic_time () + (connecttimeoutms * 1000L);

That value is then eventually passed down to mongoc-socket.c::_mongoc_socket_wait() where we have

133 timeout = (int)((expire_at - bson_get_monotonic_time ()) / 1000L);

Obviously at this point some amount of time will have already passed, so the time out will be less than what was specified. I'm seeing some quite large variations, with a connectTimeoutMS=100 I have seen the actual time out as low as 2.

Also, seeing as the time out argument to poll(2) is simply a duration in milliseconds, as quoted from the man page

"The timeout argument specifies the minimum number of milliseconds that
poll() will block."

Is there any reason to not just pass the connectTimeoutMS value as is?

Cheers,
Andrew



 Comments   
Comment by Githook User [ 01/Sep/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-541 start connectTimeoutMS after getaddrinfo

Also restart the timer for each interface attempted, e.g. if the driver
attempts IPv6, then IPv4.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/9cec261599ce928e1151fd5dbf7360fe96265b85

Comment by Andrew Clayton [ 30/Aug/16 ]

Ah, right you are. So yes, the next best thing would be to do the calculation at the last possible moment.

Comment by A. Jesse Jiryu Davis [ 30/Aug/16 ]

Thanks Andrew, unfortunately this signature is now in our public API:

int              mongoc_socket_connect    (mongoc_socket_t       *sock,
                                           const struct sockaddr *addr,
                                           socklen_t              addrlen,
                                           int64_t                expire_at);

... so we can't change it, we have to convert from timeoutms to expire_at and back to a timeout in milliseconds.

Comment by Andrew Clayton [ 30/Aug/16 ]

So it's been a while since I last looked at this. But I'll ask my last point again.

Seeing as the timeout argument to poll(2) is simply a duration in milliseconds, as quoted from the man page

"The timeout argument specifies the number of milliseconds that poll()
should block waiting for a file descriptor to become ready."

So would the simple (and correct) fix be to just pass the specified timeout value straight to poll?

Cheers,
Andrew

Comment by A. Jesse Jiryu Davis [ 30/Aug/16 ]

It appears this code has not changed. The variations you see must be the result of calculating expire_at before calling getaddrinfo. So the connect timeout applies to the getaddrinfo call plus the actual TCP connect. As you say, getaddrinfo's duration is unpredictable and some time can pass before we call connect().

I think the answer is to calculate expire_at after calling getaddrinfo. This is more like how other drivers work and probably matches the purpose of connect timeout: to distinguish between a slow server and an unavailable one, based on what you know about your network's typical latency.

Comment by A. Jesse Jiryu Davis [ 22/Oct/15 ]

Need to check this over again, timeouts and the whole network layer of the driver have been overhauled twice since this bug was reported.

Generated at Wed Feb 07 21:09:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.