Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28974

Mongos leak connections to mongods

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.12
    • Component/s: Networking
    • Labels:
      None
    • ALL
    • Hide

      Unknown what the exact trigger is, but it has happened multiple times now during high traffic peaks which have caused congestion in the internal network.

      Show
      Unknown what the exact trigger is, but it has happened multiple times now during high traffic peaks which have caused congestion in the internal network.

      We ran into an issue where mongos was suddenly creating a lot of connections to mongods and ran out of file descriptors. This eventually consumed all descriptors also on mongods which then crashed (https://jira.mongodb.org/browse/SERVER-28933).

      This happened when there was some intermittent network congestion.

      We saw errors like this when the leaks occurred:

      2017-04-24T21:48:41.865+0000 I NETWORK  [ReplicaSetMonitorWatcher] Socket recv() timeout  10.123.0.1:27017
      2017-04-24T21:48:41.865+0000 I NETWORK  [ReplicaSetMonitorWatcher] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [10.123.0.1:27017]
      2017-04-24T21:48:41.865+0000 I NETWORK  [ReplicaSetMonitorWatcher] Detected bad connection created at 1493026624910579 microSec, clearing pool for mongo5:27017 of 0 connections
      

      "0 connections" there is particularly interesting.

      Eventually the leaking lead to running out of descriptors:

      2017-04-24T21:48:42.487+0000 I NETWORK  [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
      2017-04-24T21:48:42.487+0000 E NETWORK  [mongosMain] Out of file descriptors. Waiting one second before trying to accept more connections.
      

      The logs indicated that there would be about 500 open client connections, but when inspecting with lsof, it revealed that there were less than 10. It also revealed that there were 6 connections to the mongocs, 23 connections to the primary and total of 474 connections to the two secondaries. It also showed 478 "sock" descriptors with "can't identify protocol":
      mongos 5495 mongodb 1023u sock 0,7 0t0 84718243 can't identify protocol

      Normally there isn't any of those and there's just a few connections to the secondaries as well.

      This is currently a bit special setup with a single shard (replicaset of 3 nodes) and about 200 mongoses that connect to it. The leaking occurred on all of the mongos instances (all were also affected by the network congestion issue).

      This is what connPoolStatus showed around that time (although, I've understood that it's not entirely reliable):

      MongoDB shell version: 3.2.12
      connecting to: test
      mongos> db.runCommand( { "connPoolStats" : 1 } )
      {
      	"numClientConnections" : 6,
      	"numAScopedConnections" : 0,
      	"totalInUse" : 478,
      	"totalAvailable" : 24,
      	"totalCreated" : 5000,
      	"hosts" : {
      		"mongocdata3:27019" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 1
      		},
      		"mongocdata2:27019" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 1
      		},
      		"mongocdata1:27019" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 1
      		},
      		"mongo7:27017" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 7
      		},
      		"mongo6:27017" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 4
      		},
      		"mongo5:27017" : {
      			"inUse" : 0,
      			"available" : 1,
      			"created" : 18
      		}
      	},
      	"replicaSets" : {
      		"datacfg" : {
      			"hosts" : [
      				{
      					"addr" : "mongocdata1:27019",
      					"ok" : true,
      					"ismaster" : false,
      					"hidden" : false,
      					"secondary" : true,
      					"pingTimeMillis" : 0
      				},
      				{
      					"addr" : "mongocdata2:27019",
      					"ok" : true,
      					"ismaster" : true,
      					"hidden" : false,
      					"secondary" : false,
      					"pingTimeMillis" : 0
      				},
      				{
      					"addr" : "mongocdata3:27019",
      					"ok" : true,
      					"ismaster" : false,
      					"hidden" : false,
      					"secondary" : true,
      					"pingTimeMillis" : 0
      				}
      			]
      		},
      		"rs1" : {
      			"hosts" : [
      				{
      					"addr" : "mongo5:27017",
      					"ok" : true,
      					"ismaster" : true,
      					"hidden" : false,
      					"secondary" : false,
      					"pingTimeMillis" : 0
      				},
      				{
      					"addr" : "mongo6:27017",
      					"ok" : true,
      					"ismaster" : false,
      					"hidden" : false,
      					"secondary" : true,
      					"pingTimeMillis" : 9
      				},
      				{
      					"addr" : "mongo7:27017",
      					"ok" : true,
      					"ismaster" : false,
      					"hidden" : false,
      					"secondary" : true,
      					"pingTimeMillis" : 4
      				}
      			]
      		}
      	},
      	"ok" : 1
      }
      

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            devastor Tuomas Silen
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: