Details
-
Bug
-
Resolution: Works as Designed
-
Major - P3
-
None
-
3.4.18
-
None
-
ALL
-
Hide
1. my environment is
10.200.20.21 client10.200.20.59:2710 mongos10.200.20.64:2701 mongodmy client is a go program that do a single query
package mainimport ("log""time""github.com/globalsign/mgo""gopkg.in/mgo.v2/bson")func main() {s, err := mgo.Dial("10.200.20.59:2710")if err != nil {panic(err)}s.SetSocketTimeout(1.5e9)coll := s.DB("beta_rs").C("rs")var ret interface{}log.Println("start")err = coll.Find(bson.M{"_id": "8h1hkp:BatchUploadFiles_98Dk5t7Mkb"}).One(&ret)log.Println(ret, err)time.Sleep(100e9)}2. drop all connection from mongos to mongod
run this script on mongod's server:iptables -I qiniu-shield-chain 4 -s 10.200.20.59 -p tcp --dport 2701 -j DROP3. run the client, the query keeps about 1.5 second and meets SocketTimeout, then the client killed the connection.
qboxserver@cs1:~$ ./testrs2018/11/20 16:44:06 start2018/11/20 16:44:07 <nil> read tcp 10.200.20.21:20006->10.200.20.59:2710: i/o timeout^C4. But mongos keeps the connection 13 seconds, trys to connect to mongod 4 times.
2018-11-20T16:44:06.894+0800 I NETWORK [thread2] connection accepted from 10.200.20.21:20006 #18188 (14 connections now open)2018-11-20T16:44:06.895+0800 I NETWORK [conn18188] received client metadata from 10.200.20.21:20006 conn18188: { driver: { name: "mgo", version: "globalsign" }, os: { architecture: "amd64", type: "linux" } }2018-11-20T16:44:06.896+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:27012018-11-20T16:44:08.397+0800 I NETWORK [thread2] connection accepted from 10.200.20.21:25687 #18189 (15 connections now open)2018-11-20T16:44:08.397+0800 I NETWORK [conn18189] received client metadata from 10.200.20.21:25687 conn18189: { driver: { version: "globalsign", name: "mgo" }, os: { type: "linux", architecture: "amd64" } }2018-11-20T16:44:09.897+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out2018-11-20T16:44:09.897+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection2018-11-20T16:44:09.897+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out2018-11-20T16:44:09.898+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:27012018-11-20T16:44:12.901+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out2018-11-20T16:44:12.901+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection2018-11-20T16:44:12.901+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out2018-11-20T16:44:13.403+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:27012018-11-20T16:44:16.405+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out2018-11-20T16:44:16.405+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection2018-11-20T16:44:16.405+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out2018-11-20T16:44:16.907+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:27012018-11-20T16:44:19.909+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out2018-11-20T16:44:19.909+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection2018-11-20T16:44:19.909+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out2018-11-20T16:44:19.910+0800 I - [conn18188] end connection 10.200.20.21:20006 (15 connections now open)# entered ctrl+C on client here2018-11-20T16:44:26.151+0800 I - [conn18189] end connection 10.200.20.21:25687 (14 connections now open)5. When retrying, mongos keeps a CLOSE_WAIT socket with the client.
mongos 32131 qboxserver 34u IPv4 3719101316 0t0 TCP 10.200.20.59:2710->10.200.20.21:25057 (CLOSE_WAIT)mongos 32131 qboxserver 137u IPv4 3719114023 0t0 TCP 10.200.20.59:2710->10.200.20.21:39569 (ESTABLISHED)Show1. my environment is 10.200.20.21 client 10.200.20.59:2710 mongos 10.200.20.64:2701 mongod my client is a go program that do a single query package main import ( "log" "time" "github.com/globalsign/mgo" "gopkg.in/mgo.v2/bson" ) func main() { s, err := mgo.Dial("10.200.20.59:2710") if err != nil { panic(err) } s.SetSocketTimeout(1.5e9) coll := s.DB("beta_rs").C("rs") var ret interface{} log.Println("start") err = coll.Find(bson.M{"_id": "8h1hkp:BatchUploadFiles_98Dk5t7Mkb"}).One(&ret) log.Println(ret, err) time.Sleep(100e9) } 2. drop all connection from mongos to mongod run this script on mongod's server: iptables -I qiniu-shield-chain 4 -s 10.200.20.59 -p tcp --dport 2701 -j DROP 3. run the client, the query keeps about 1.5 second and meets SocketTimeout, then the client killed the connection. qboxserver@cs1:~$ ./testrs 2018/11/20 16:44:06 start 2018/11/20 16:44:07 <nil> read tcp 10.200.20.21:20006->10.200.20.59:2710: i/o timeout ^C 4. But mongos keeps the connection 13 seconds, trys to connect to mongod 4 times. 2018-11-20T16:44:06.894+0800 I NETWORK [thread2] connection accepted from 10.200.20.21:20006 #18188 (14 connections now open) 2018-11-20T16:44:06.895+0800 I NETWORK [conn18188] received client metadata from 10.200.20.21:20006 conn18188: { driver: { name: "mgo", version: "globalsign" }, os: { architecture: "amd64", type: "linux" } } 2018-11-20T16:44:06.896+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:2701 2018-11-20T16:44:08.397+0800 I NETWORK [thread2] connection accepted from 10.200.20.21:25687 #18189 (15 connections now open) 2018-11-20T16:44:08.397+0800 I NETWORK [conn18189] received client metadata from 10.200.20.21:25687 conn18189: { driver: { version: "globalsign", name: "mgo" }, os: { type: "linux", architecture: "amd64" } } 2018-11-20T16:44:09.897+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out 2018-11-20T16:44:09.897+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection 2018-11-20T16:44:09.897+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out 2018-11-20T16:44:09.898+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:2701 2018-11-20T16:44:12.901+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out 2018-11-20T16:44:12.901+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection 2018-11-20T16:44:12.901+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out 2018-11-20T16:44:13.403+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:2701 2018-11-20T16:44:16.405+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out 2018-11-20T16:44:16.405+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection 2018-11-20T16:44:16.405+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out 2018-11-20T16:44:16.907+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Connecting to 10.200.20.64:2701 2018-11-20T16:44:19.909+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Failed to connect to 10.200.20.64:2701 - HostUnreachable: Connection timed out 2018-11-20T16:44:19.909+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-35-0] Dropping all pooled connections to 10.200.20.64:2701 due to failed operation on a connection 2018-11-20T16:44:19.909+0800 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-35-0] Marking host 10.200.20.64:2701 as failed :: caused by :: HostUnreachable: Connection timed out 2018-11-20T16:44:19.910+0800 I - [conn18188] end connection 10.200.20.21:20006 (15 connections now open) # entered ctrl+C on client here 2018-11-20T16:44:26.151+0800 I - [conn18189] end connection 10.200.20.21:25687 (14 connections now open) 5. When retrying, mongos keeps a CLOSE_WAIT socket with the client. mongos 32131 qboxserver 34u IPv4 3719101316 0t0 TCP 10.200.20.59:2710->10.200.20.21:25057 (CLOSE_WAIT) mongos 32131 qboxserver 137u IPv4 3719114023 0t0 TCP 10.200.20.59:2710->10.200.20.21:39569 (ESTABLISHED)
Description
When a mongod is dropped, mongos keeps retrying to connect to mongod even if the client has killed the connection.
The mongos rapidly inrcrease its open files if mongos do not stop retrying to connnect to mongod if it has got a FIN from the client.