[SERVER-33264] networkAsio create connection very slow Created: 12/Feb/18  Updated: 11/Apr/18  Resolved: 09/Mar/18

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: shawn Assignee: Dmitry Agranat
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File connection_spike_server_33264.png    
Participants:

 Description   

Hi

as described in SERVER-32916 ,
we set ShardingTaskExecutorPoolMinSize: 8 and taskExecutorPoolSize: 12 ,
finally, we still find some slow connection threads, which took 3008ms .
and ,these slow connections were connecting to the same mongod instance.

2018-02-12T14:54:11.525+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-7-0] Successfully connected to 10.136.126.4:27000, took 231ms (11 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:11.640+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-7-0] Successfully connected to 10.136.126.4:27000, took 201ms (11 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:14.488+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-7-0] Successfully connected to 10.136.126.4:27000, took 3003ms (11 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:22.433+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Successfully connected to 10.136.126.4:27000, took 265ms (10 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:22.447+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to 10.136.126.4:27000, took 266ms (12 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:22.479+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to 10.136.126.4:27000, took 75ms (12 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:22.493+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to 10.136.126.4:27000, took 66ms (12 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:25.274+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Successfully connected to 10.136.126.4:27000, took 3004ms (10 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:25.289+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-5-0] Successfully connected to 10.136.126.4:27000, took 3007ms (9 connections now open to 10.136.126.4:27000)
2018-02-12T14:54:25.327+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to 10.136.126.4:27000, took 3005ms (12 connections now open to 10.136.126.4:27000)

Thanks.



 Comments   
Comment by shawn [ 11/Apr/18 ]

Hi Dima

1. The reason of enable heapProfiler is that, we met this issue issue 22224 . but we didn't found this is a bug ,so we enable heapProfiler .
With the help of Bruce Lucas , we get the root cause of this issue, and we didn't restart this mongod instance.

2. issue of connections
we choose a mongo instance , set these parameter: ShardingTaskExecutorPoolMinSize: 8 and taskExecutorPoolSize: 12 , restart it.
the next day, using this shell "tail -200000 /path to mongos.log |awk '/took/{gsub(/ms/,"", $10); if($10>3){print $16}}'|sort |uniq -c ", founding:
2872 IP1:config replica set port)
2640 IP2:config replica set port)

using the same shell on the host which doesn't set parameter, it showed:
17 IP3:config replica set port)
28 IP1:config replica set port)
56 IP2:config replica set port)
222 IP4:data replica port)

we can see, the slow connection to the config replica set increase sharply, but the slow connection to mongod instance disappear.

Is there anything i missed ?

Thanks.

Comment by Dmitry Agranat [ 28/Feb/18 ]

Hi shawn001,

Thank you for providing the requested information. It appears that the reported connection creation was slow because of the mongod process being slow.

  • The time in the above graph is in UTC
  • Until A and after B, we are serving ~10K ops
  • At A, in a matter of a few seconds, ~10K connections were created. Please also note a very high kernel CPU during this time
  • Between A and B, we can see the drop in the ops and the reported slow connections
  • After B, the connections rate returned to 0.5K connections per second and the ops rate returned to normal

As a side note, could you please clarify why you are running with heapProfiler enabled?

Thanks,
Dima

Comment by shawn [ 26/Feb/18 ]

HI

the complete logs as following.

metrics.tar.gz

mongod.log

mongos.log

Comment by Kelsey Schubert [ 12/Feb/18 ]

Hi shawn001,

Would you please upload the complete logs?

Thank you,
Kelsey

Generated at Thu Feb 08 04:32:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.