Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43211

mongos claims it is accepting connections but does not

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Networking
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Service Arch 2019-09-09, Service Arch 2019-09-23

      I started a 4.3 sharded deployment with the args specified here (https://github.com/p-mongo/dev/blob/master/script/launch-4.4-sharded-multishard):

      port=14440
      launchargs="--replicaset --nodes 2 --sharded 2 --name ruby-driver-rs --mongos 2"
      

      The server is:

      butler% /usr/local/m/versions/4.4/mongos --version
      mongos version v4.3.0-574-g6e02a4d
      git version: 6e02a4d34bd972e6755bb5f71a5b26f69fe2cfb0
      OpenSSL version: OpenSSL 1.1.1c  28 May 2019
      allocator: tcmalloc
      modules: none
      build environment:
          distarch: x86_64
          target_arch: x86_64
      

      mlaunch produced this output:

      butler% ./script/launch-4.4-sharded-multishard
      Base port: 14440
      launching: config server on port 14446
      launching: "/usr/local/m/versions/4.4/mongod" on port 14442
      launching: "/usr/local/m/versions/4.4/mongod" on port 14443
      launching: "/usr/local/m/versions/4.4/mongod" on port 14444
      launching: "/usr/local/m/versions/4.4/mongod" on port 14445
      launching: /usr/local/m/versions/4.4/mongos on port 14440
      launching: /usr/local/m/versions/4.4/mongos on port 14441
      

      The log from mongos is here: https://gist.github.com/p-mongo/a206f10247c39eaa92c63e1c0c977f72

      Note it contains the following line:

      2019-09-06T16:08:09.263-0400 I  NETWORK  [mongosMain] Listening on 127.0.0.1
      2019-09-06T16:08:09.263-0400 I NETWORK [mongosMain] waiting for connections on port 14440
      

      However, connection to 14440 fails:

      butler% mongo --port 14440
      MongoDB shell version v3.6.9
      connecting to: mongodb://127.0.0.1:14440/
      2019-09-06T16:10:39.341-0400 W NETWORK  [thread1] Failed to connect to 127.0.0.1:14440 after 5000ms milliseconds, giving up.
      2019-09-06T16:10:39.359-0400 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:14440, connection attempt failed :
      connect@src/mongo/shell/mongo.js:257:13
      @(connect):1:6
      exception: connect failed
      

      Also note that there are no errors indicated in the server log. There is one warning which is this:

      2019-09-06T16:08:07.259-0400 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
      

      I expect that if mongos claims that it accepts connections, that it actually accepts connections, and writes error level messages to the log if a connection attempt fails. If mongos is not accepting connections I expect it to indicate what it is doing so that I can track its progress toward being in a usable state.

      When my deployment gets in the state described in this ticket, it appears to be stuck in this state and killing all processes and restarting them does not seem to unstick it. I need to nuke the data directories for all mongos+mongod nodes and rebuild the entire deployment from scratch.

            Assignee:
            mira.carey@mongodb.com Mira Carey
            Reporter:
            oleg.pudeyev@mongodb.com Oleg Pudeyev (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: