[SERVER-27043] Command takes much longer to fail than the set timeout, when waiting for a connection from the connection pool Created: 15/Nov/16  Updated: 07/Dec/17  Resolved: 13/Nov/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Dianna Hohensee (Inactive) Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

Failures, in time dependent tests, are happening due to timeouts not being enforced by the ASIO layer when waiting for a connection from the connection pool.

A couple examples from the logs, taking 90 seconds with a 10 second timeout, and 180 seconds with a 30 second timeout, respectively:

[js_test:remove2] 2016-11-10T19:01:28.944+0000 d20511| 2016-11-10T19:00:47.542+0000 I REPL     [replication-3] Restarting oplog query due to error: ExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 91286ms, timeout was set to 10000ms. Last fetched optime (with hash): { ts: Timestamp 1478804164000|30, t: 3 }[-9208730938153294485]. Restarts remaining: 3

[js_test:remove2] 2016-11-10T19:01:28.951+0000 d20511| 2016-11-10T19:01:19.420+0000 I NETWORK  [shard registry reload] Marking host ip-10-5-194-11:20514 as failed :: caused by :: ExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 182264ms, timeout was set to 30000ms



 Comments   
Comment by Samantha Ritter (Inactive) [ 13/Nov/17 ]

Because we haven't seen this in a year, and we have never been able to reproduce this behavior, I am going to close this ticket as "gone away."

Comment by Dianna Hohensee (Inactive) [ 21/Apr/17 ]

SERVER-26859 was supposed to fix a situation where ASIO callback threads were blocked, in the async results merger. I would expect the fix to have actually worked, rather than cause more problems? We aren't currently aware of blocking ASIO callback threads anywhere else in sharding.

Generated at Thu Feb 08 04:14:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.