Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43211

mongos claims it is accepting connections but does not



    • Type: Bug
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Networking
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
    • Sprint:
      Service Arch 2019-09-09, Service Arch 2019-09-23


      I started a 4.3 sharded deployment with the args specified here (https://github.com/p-mongo/dev/blob/master/script/launch-4.4-sharded-multishard):

      launchargs="--replicaset --nodes 2 --sharded 2 --name ruby-driver-rs --mongos 2"

      The server is:

      butler% /usr/local/m/versions/4.4/mongos --version
      mongos version v4.3.0-574-g6e02a4d
      git version: 6e02a4d34bd972e6755bb5f71a5b26f69fe2cfb0
      OpenSSL version: OpenSSL 1.1.1c  28 May 2019
      allocator: tcmalloc
      modules: none
      build environment:
          distarch: x86_64
          target_arch: x86_64

      mlaunch produced this output:

      butler% ./script/launch-4.4-sharded-multishard
      Base port: 14440
      launching: config server on port 14446
      launching: "/usr/local/m/versions/4.4/mongod" on port 14442
      launching: "/usr/local/m/versions/4.4/mongod" on port 14443
      launching: "/usr/local/m/versions/4.4/mongod" on port 14444
      launching: "/usr/local/m/versions/4.4/mongod" on port 14445
      launching: /usr/local/m/versions/4.4/mongos on port 14440
      launching: /usr/local/m/versions/4.4/mongos on port 14441

      The log from mongos is here: https://gist.github.com/p-mongo/a206f10247c39eaa92c63e1c0c977f72

      Note it contains the following line:

      2019-09-06T16:08:09.263-0400 I  NETWORK  [mongosMain] Listening on
      2019-09-06T16:08:09.263-0400 I NETWORK [mongosMain] waiting for connections on port 14440

      However, connection to 14440 fails:

      butler% mongo --port 14440
      MongoDB shell version v3.6.9
      connecting to: mongodb://
      2019-09-06T16:10:39.341-0400 W NETWORK  [thread1] Failed to connect to after 5000ms milliseconds, giving up.
      2019-09-06T16:10:39.359-0400 E QUERY    [thread1] Error: couldn't connect to server, connection attempt failed :
      exception: connect failed

      Also note that there are no errors indicated in the server log. There is one warning which is this:

      2019-09-06T16:08:07.259-0400 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document

      I expect that if mongos claims that it accepts connections, that it actually accepts connections, and writes error level messages to the log if a connection attempt fails. If mongos is not accepting connections I expect it to indicate what it is doing so that I can track its progress toward being in a usable state.

      When my deployment gets in the state described in this ticket, it appears to be stuck in this state and killing all processes and restarting them does not seem to unstick it. I need to nuke the data directories for all mongos+mongod nodes and rebuild the entire deployment from scratch.




            jason.carey Jason Carey
            oleg.pudeyev Oleg Pudeyev
            0 Vote for this issue
            5 Start watching this issue