[SERVER-22748] NetworkInterfaceASIO opens new connection for each request Created: 18/Feb/16  Updated: 13/Apr/20  Resolved: 15/Sep/16

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 3.2.3, 3.3.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Mira Carey
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22077 Improve the runtime of sharding js tests Closed
Backwards Compatibility: Fully Compatible
Operating System: Windows
Sprint: Platforms 12 (04/01/16), Platforms 13 (04/22/16), Platforms 14 (05/13/16), Platforms 15 (06/03/16), Platforms 16 (06/24/16), Platforms 17 (07/15/16), Platforms 18 (08/05/16), Platforms 2016-08-26, Platforms 2016-09-19
Participants:
Linked BF Score: 0

 Description   

NOTE: I have only tried to reproduce this on Windows DEBUG.

It looks like the ASIO network interface fails to properly reuse the pooled connections and tries to run AsyncStream::connect for each request. Furthermore, it looks like AsyncStream::connect takes 2 seconds to return, which significantly slows down single-threaded scripts such as our JS tests.

I discovered this trying to figure out why find operations block for 2 seconds, but write commands don't. Find operations in 3.2 go through the network interface and write commands still use the legacy client.

After adding some instrumentation to the code to print the connect time, I observed that each find operation executes the connect code, which takes about 2 seconds. Here is an example of the output. Note the lines which say 'Connect completed in 2002 ms'.

> assert.eq(2, st.s0.getDB('TestDB').TestColl.find().itcount());
s20004| 2016-02-18T18:49:42.339-0500 D ASIO     [NetworkInterfaceASIO-0] Connecting to kaloianmdesktop:20000
---> s20004| 2016-02-18T18:49:44.341-0500 D ASIO    [NetworkInterfaceASIO-0] Connect completed in 2002 ms
d20000| 2016-02-18T18:49:44.341-0500 I NETWORK  [initandlisten] connection accepted from 10.4.123.233:64094 #17 (4 connections now open)
s20004| 2016-02-18T18:49:44.353-0500 I ASIO     [NetworkInterfaceASIO-0] Successfully connected to kaloianmdesktop:20000
 
> assert.eq(2, st.s0.getDB('TestDB').TestColl.find().itcount());
s20004| 2016-02-18T18:49:47.202-0500 D ASIO     [NetworkInterfaceASIO-0] Connecting to kaloianmdesktop:20000
---> s20004| 2016-02-18T18:49:49.206-0500 D ASIO     [NetworkInterfaceASIO-0] Connect completed in 2003 ms
d20000| 2016-02-18T18:49:49.206-0500 I NETWORK  [initandlisten] connection accepted from 10.4.123.233:64097 #18 (5 connections now open)
s20004| 2016-02-18T18:49:49.213-0500 I ASIO     [NetworkInterfaceASIO-0] Successfully connected to kaloianmdesktop:20000

Note that both requests are going through NetworkExecutor-0, so opening new connections cannot be attributed to the usage of multiple network executors for scaling.

I believe this is what is causing the slow down in the Windows sharding suites.



 Comments   
Comment by Mira Carey [ 15/Sep/16 ]

I've never been able to repro this and connection pool behavior doesn't appear to be degenerate to the degree expressed in this ticket in release builds, so closing gone away.

Generated at Thu Feb 08 04:01:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.