[SERVER-27232] Refresh and Setup timeouts in the ASIO connpool can prematurely time out an operation Created: 30/Nov/16  Updated: 17/Apr/17  Resolved: 08/Dec/16

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: 3.2.12, 3.4.1, 3.5.1

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Done Votes: 0
Labels: platforms-hocr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-26722 router blocks and throws ExceededTime... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4, v3.2
Sprint: Platforms 2017-01-23
Participants:

 Description   

Initial connects and later refreshes have a timeout associated with them in ASIO that isn't linked to any user generated timeout. These timeouts, when they trigger, are registered as general failures however. And general failures cause us to dump all connections from the pool (propagating that error to all consumers currently waiting for a connection).

That scheme is sound for actual io errors (because a failure to rpc on one connection almost certainly means something is badly wrong with all other open connections), but causes us to fail early and often when applied to timeouts.

The fix is to treat timeouts on connect and refresh lightly (start connecting a new connection on timeout) and allow the general request timeouts to handle timing out user requests



 Comments   
Comment by Githook User [ 08/Dec/16 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: SERVER-27232 Fix early timeout in ASIO connpool

Initial connects and later refreshes have a timeout associated with them
in ASIO that isn't linked to any user generated timeout. These
timeouts, when they trigger, are registered as general failures however.
And general failures cause us to dump all connections from the pool
(propagating that error to all consumers currently waiting for a
connection).

That scheme is sound for actual io errors (because a failure to rpc on
one connection almost certainly means something is badly wrong with all
other open connections), but causes us to fail early and often when
applied to timeouts.

The fix is to treat timeouts on connect and refresh lightly (start
connecting a new connection on timeout) and allow the general request
timeouts to handle timing out user requests.

(cherry picked from commit 78f62c485a390f79c84baea51d840aaa8fb9c999)
Branch: v3.4
https://github.com/mongodb/mongo/commit/743aaabc8aa4600599a79f6ef056a8e9e02e0fc6

Comment by Githook User [ 08/Dec/16 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: SERVER-27232 Fix early timeout in ASIO connpool

Initial connects and later refreshes have a timeout associated with them
in ASIO that isn't linked to any user generated timeout. These
timeouts, when they trigger, are registered as general failures however.
And general failures cause us to dump all connections from the pool
(propagating that error to all consumers currently waiting for a
connection).

That scheme is sound for actual io errors (because a failure to rpc on
one connection almost certainly means something is badly wrong with all
other open connections), but causes us to fail early and often when
applied to timeouts.

The fix is to treat timeouts on connect and refresh lightly (start
connecting a new connection on timeout) and allow the general request
timeouts to handle timing out user requests.

(cherry picked from commit 78f62c485a390f79c84baea51d840aaa8fb9c999)
Branch: v3.2
https://github.com/mongodb/mongo/commit/3ca494dabcbea4643bce8b6414de5559036da990

Comment by Githook User [ 08/Dec/16 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: SERVER-27232 Fix early timeout in ASIO connpool

Initial connects and later refreshes have a timeout associated with them
in ASIO that isn't linked to any user generated timeout. These
timeouts, when they trigger, are registered as general failures however.
And general failures cause us to dump all connections from the pool
(propagating that error to all consumers currently waiting for a
connection).

That scheme is sound for actual io errors (because a failure to rpc on
one connection almost certainly means something is badly wrong with all
other open connections), but causes us to fail early and often when
applied to timeouts.

The fix is to treat timeouts on connect and refresh lightly (start
connecting a new connection on timeout) and allow the general request
timeouts to handle timing out user requests.
Branch: master
https://github.com/mongodb/mongo/commit/78f62c485a390f79c84baea51d840aaa8fb9c999

Generated at Thu Feb 08 04:14:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.