Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-6297

Socket Exception code 9001

    XMLWordPrintable

    Details

    • Operating System:
      ALL

      Description

      Hi.

      I have a sharded cluster using authentication. If I stop one of the config servers along with one of my data nodes, I start getting this error when attempting to connect to mongos and run any commands: uncaught exception: error

      { "$err" : "socket exception", "code" : 9001 }

      .

      The problem appears to be worse in 2.0.6. If I just shut down a single config server in 2.0.6 I immediately start getting socket exception errors.

      Looks like this is probably related to SERVER-6178.

      Steps to reproduce:

      1. Create a sharded authenticated database with the following config

      2 shards - 2xdata, 1xarb
      3 config dbs
      1 mongos

      2. Add an admin user

      3. Stop one configdb

      4. Stop secondary on one shard

      5. Wait a few minutes - seems to start after syncluster fails

      6. Attempt to connect

      I turned up logging in mongos and got the following:

      Tue Jul 3 22:04:33 [mongosMain] connection accepted from 127.0.0.1:62779 #32
      Tue Jul 3 22:04:33 [conn32] authenticate:

      { authenticate: 1.0, user: "admin", nonce: "ee41c70123ef341", key: "dc29f18bef6f52f3abf09f0f49574f83" }

      Tue Jul 3 22:04:33 [conn32] DBClientCursor::init call() failed
      Tue Jul 3 22:04:33 [conn32] sharded connection to localhost:50010,localhost:50020,localhost:50030 not being returned to the pool
      Tue Jul 3 22:04:33 [conn32] end connection 127.0.0.1:62779
      Tue Jul 3 22:04:35 [ReplicaSetMonitorWatcher] trying reconnect to localhost:20020
      Tue Jul 3 22:04:35 [ReplicaSetMonitorWatcher] reconnect localhost:20020 failed couldn't connect to server localhost:20020
      Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50010]
      Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50020]
      Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50030]
      Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
      Tue Jul 3 22:04:35 [LockPinger] trying reconnect to localhost:50030
      Tue Jul 3 22:04:35 [LockPinger] reconnect localhost:50030 failed couldn't connect to server localhost:50030
      Tue Jul 3 22:04:35 [LockPinger] warning: distributed lock pinger 'localhost:50010,localhost:50020,localhost:50030/Jeffs-MacBook-Air.local:27017:1341378125:16807' detected an exception while pinging. :: caused by :: socket exception
      Tue Jul 3 22:04:38 [mongosMain] connection accepted from 127.0.0.1:62792 #33
      Tue Jul 3 22:04:38 [conn33] authenticate:

      { authenticate: 1.0, user: "admin", nonce: "320e3c628074ca23", key: "9e5c499327208a089c7cdab6e16bd5dc" }

      Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50010]
      Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50020]
      Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50030]
      Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
      Tue Jul 3 22:04:38 [conn33] trying reconnect to localhost:50030
      Tue Jul 3 22:04:38 [conn33] reconnect localhost:50030 failed couldn't connect to server localhost:50030
      Tue Jul 3 22:04:38 [conn33] DBException in process: socket exception
      Tue Jul 3 22:04:38 [conn33] end connection 127.0.0.1:62792
      Tue Jul 3 22:04:40 [mongosMain] connection accepted from 127.0.0.1:62800 #34
      Tue Jul 3 22:04:40 [conn34] authenticate:

      { authenticate: 1.0, user: "admin", nonce: "6d642df666f58f77", key: "058a8a22d7a7d96b486bdce134e53026" }

      Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50010]
      Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50020]
      Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50030]
      Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
      Tue Jul 3 22:04:40 [conn34] trying reconnect to localhost:50030
      Tue Jul 3 22:04:40 [conn34] reconnect localhost:50030 failed couldn't connect to server localhost:50030
      Tue Jul 3 22:04:40 [conn34] DBException in process: socket exception
      Tue Jul 4 22:04:40 [conn34] end connection 127.0.0.1:62800

        Attachments

        1. auth_with_config_down.js
          3 kB
        2. mongos_local.log.gz
          4 kB

          Issue Links

            Activity

              People

              Assignee:
              greg_10gen Greg Studer
              Reporter:
              jlee Jeff lee
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: