-
Type:
Task
-
Resolution: Done
-
Priority:
Minor - P4
-
None
-
Affects Version/s: 2.1.4
-
Component/s: MongoDB 3.2
-
Environment:AWS VM Environment, 3 mongo replicaset across regions with IPSEC tunnel
It's a fairly hard thing to reproduce, so bear with me... also, I'm not sure if the problem is in the node driver or in mongodb itself, so I'm going to try to log it in both places
The environment
----------------------
I have an environment split among two Amazon Region (OR and VA).
OR: 1 NAT, 1 SVC, 2 Mongo
VA: 1 NAT, 1 SVC, 1 Mongo (with priority = 0)
Both regions are connected by an ipsec tunnel managed by the 2 NAT machines, such that all traffic between regions goes through this tunnel,
The SVC machines each run 3 different services, each of these node services keeps a connection pool for their duration. The connection is made with the following url:
var url = 'mongodb://10.56.4.54:27017,10.72.3.114:27017,10.72.4.96:27017/mydatabase?maxPoolSize=100&replicaSet=replica-set-name'
It all seems to be running fine, until the IPSEC tunnel goes down. "mongostat --discover" shows roughly 600 connections spread accross all 3 mongo servers.
When the tunnel goes down (even if momentarily), the services on the OR region function correctly, the services on the VA region function for a little bit until their local Mongo realizes it's disconnected, at which point they just hang.
The problem is when the tunnel goes back up. At that point hundreds of connections ACCROSS the tunnel are made, almost as if each bad connection in each connection pool was causing a whole other connection pool. Of course, if my ulimit is set too low mongo stops accepting connections. Mongo sees these open connections as idle (dbCurrentOp(true) shows them as active=false). Using lsof (or netstat) I can see that the extra connections are TCP sockets in ESTABLISHED state accross the tunnel (from the mongo box they show as being opened to the ipsec box, NOT to their originating box on the other side of the tunnel). If I kill the services in the SVC boxes the connections do go away.
- is duplicated by
-
SERVER-22280 Sockets/connections created and left hanging
-
- Closed
-