-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: Networking
-
Labels:None
-
Fully Compatible
-
ALL
-
Service Arch 2019-09-09, Service Arch 2019-09-23
I started a 4.3 sharded deployment with the args specified here (https://github.com/p-mongo/dev/blob/master/script/launch-4.4-sharded-multishard):
port=14440
launchargs="--replicaset --nodes 2 --sharded 2 --name ruby-driver-rs --mongos 2"
The server is:
butler% /usr/local/m/versions/4.4/mongos --version mongos version v4.3.0-574-g6e02a4d git version: 6e02a4d34bd972e6755bb5f71a5b26f69fe2cfb0 OpenSSL version: OpenSSL 1.1.1c 28 May 2019 allocator: tcmalloc modules: none build environment: distarch: x86_64 target_arch: x86_64
mlaunch produced this output:
butler% ./script/launch-4.4-sharded-multishard Base port: 14440 launching: config server on port 14446 launching: "/usr/local/m/versions/4.4/mongod" on port 14442 launching: "/usr/local/m/versions/4.4/mongod" on port 14443 launching: "/usr/local/m/versions/4.4/mongod" on port 14444 launching: "/usr/local/m/versions/4.4/mongod" on port 14445 launching: /usr/local/m/versions/4.4/mongos on port 14440 launching: /usr/local/m/versions/4.4/mongos on port 14441
The log from mongos is here: https://gist.github.com/p-mongo/a206f10247c39eaa92c63e1c0c977f72
Note it contains the following line:
2019-09-06T16:08:09.263-0400 I NETWORK [mongosMain] Listening on 127.0.0.1
2019-09-06T16:08:09.263-0400 I NETWORK [mongosMain] waiting for connections on port 14440
However, connection to 14440 fails:
butler% mongo --port 14440
MongoDB shell version v3.6.9
connecting to: mongodb://127.0.0.1:14440/
2019-09-06T16:10:39.341-0400 W NETWORK [thread1] Failed to connect to 127.0.0.1:14440 after 5000ms milliseconds, giving up.
2019-09-06T16:10:39.359-0400 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:14440, connection attempt failed :
connect@src/mongo/shell/mongo.js:257:13
@(connect):1:6
exception: connect failed
Also note that there are no errors indicated in the server log. There is one warning which is this:
2019-09-06T16:08:07.259-0400 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
I expect that if mongos claims that it accepts connections, that it actually accepts connections, and writes error level messages to the log if a connection attempt fails. If mongos is not accepting connections I expect it to indicate what it is doing so that I can track its progress toward being in a usable state.
When my deployment gets in the state described in this ticket, it appears to be stuck in this state and killing all processes and restarting them does not seem to unstick it. I need to nuke the data directories for all mongos+mongod nodes and rebuild the entire deployment from scratch.