[SERVER-33790] Mongos v3.7.2-387 fails to connect to config server at startup on Windows and Mac Created: 09/Mar/18  Updated: 23/Apr/18  Resolved: 14/Mar/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Jonathan Reams
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-33894 TransportLayerASIO should log resolve... Closed
is related to SERVER-33883 Implement fallback from IPv6 to IPv4 ... Closed
Operating System: ALL
Participants:

 Description   

It appears that mongos cannot connect to config server replica set when mongo-orchestration is starting a sharded cluster without authentication and without SSL.

I can also reproduce this locally on Mac with mongodb version:

mongos version v3.7.2-387-g0d53707
git version: 0d5370783beeb4936a181dd2f69387da4b5e816c
OpenSSL version: OpenSSL 0.9.8zh 14 Jan 2016
allocator: system
modules: enterprise
build environment:
    distarch: x86_64
    target_arch: x86_64

This is the sequence of events:

  1. Mongo-orchestration successfully starts a single member config server replica set on "localhost:1026"
  2. Mongo-orchestration starts a single mongos
  3. mongos fails to connect to the config server:

    2018-03-08T21:35:16.137+0000 W NETWORK  [mongosMain] Unable to reach primary for set 82477b90-45d1-4735-ac9d-716451b6de27
    2018-03-08T21:35:16.137+0000 W SHARDING [mongosMain] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set 82477b90-45d1-4735-ac9d-716451b6de27
    2018-03-08T21:35:17.637+0000 W NETWORK  [monitoring keys for HMAC] Unable to reach primary for set 82477b90-45d1-4735-ac9d-716451b6de27
    

  4. Indeed the config server never recieves a connection from the mongos. Only connections from mongo-ochestration are present in the config server's log.
  5. Mongo-orchestration fails

Here's mongos failing to connect to the config server with a log level 5:

3-09T11:51:39.979-0800 D NETWORK  [monitoring keys for HMAC] Starting new refresh of replica set 561b1476-7e06-4a37-a2e0-5eedfa80d336
2018-03-09T11:51:39.980-0800 D NETWORK  [monitoring keys for HMAC] creating new connection to:localhost:1026
2018-03-09T11:51:39.981-0800 D -        [monitoring keys for HMAC] User Assertion: Location40356: connection pool: connect failed localhost:1026 : couldn't connect to server localhost:1026, connection attempt failed: SocketException: Connection refused src/mongo/client/connpool.cpp 394
2018-03-09T11:51:39.981-0800 W NETWORK  [monitoring keys for HMAC] Unable to reach primary for set 561b1476-7e06-4a37-a2e0-5eedfa80d336
2018-03-09T11:51:40.049-0800 D NETWORK  [mongosMain] Starting new refresh of replica set 561b1476-7e06-4a37-a2e0-5eedfa80d336
2018-03-09T11:51:40.049-0800 D NETWORK  [mongosMain] creating new connection to:localhost:1026
2018-03-09T11:51:40.050-0800 D -        [mongosMain] User Assertion: Location40356: connection pool: connect failed localhost:1026 : couldn't connect to server localhost:1026, connection attempt failed: SocketException: Connection refused src/mongo/client/connpool.cpp 394
2018-03-09T11:51:40.050-0800 W NETWORK  [mongosMain] Unable to reach primary for set 561b1476-7e06-4a37-a2e0-5eedfa80d336
2018-03-09T11:51:40.050-0800 I NETWORK  [mongosMain] Cannot reach any nodes for set 561b1476-7e06-4a37-a2e0-5eedfa80d336. Please check network connectivity and the status of the set. This has happened for 141 checks in a row.

For more info see https://github.com/10gen/mongo-orchestration/issues/239



 Comments   
Comment by Jonathan Reams [ 14/Mar/18 ]

We've decided not to fix this since the old behavior depended on some hard-coded hostname resolution rules ("localhost" always resolved to "127.0.0.1" no matter what) that we'd prefer not to carry forward. The two tickets to improve current behavior have been opened and are linked to this ticket.

Comment by Shane Harvey [ 14/Mar/18 ]

This issue is actually a bug in mongo-orchestration's sharded cluster setup. It unintentionally starts a config server without --ipv6 when the rest of the nodes in the cluster have --ipv6.

I do have two requests:

  1. The server should resolve the hostname and attempt to connect to each address (ipv4 or ipv6). I'll open a new ticket for this.
  2. The connect error message should include the actual ip address. Right now with the highest logging level, it includes the hostname twice:
    "User Assertion: Location40356: connection pool: connect failed localhost:1026 : couldn't connect to server localhost:1026, connection attempt failed: SocketException: Connection refused src/mongo/client/connpool.cpp 394"
Comment by Jonathan Reams [ 13/Mar/18 ]

The issue is that the old egress networking code always resolved localhost to 127.0.0.1. This behavior dates back to the 1.8 days. This is pretty straightforward to fix, but this "fixed" behavior feels very strange to me.

Comment by Bernie Hackett [ 13/Mar/18 ]

Is the issue that when mongos is started with --ipv6 mongod also now has to be started with --ipv6? If mongod had been started with --ipv6 would we never have seen this bug?

Comment by Shane Harvey [ 12/Mar/18 ]

Again this is on both Windows and Mac OS with the latest build of the server:

$ ./mongodb-osx-x86_64-enterprise-3.7.2-416-g421c1e8/bin/mongo --version
MongoDB shell version v3.7.2-416-g421c1e8
git version: 421c1e82b30a14b3bca8e2bf5ef9df2745c5ee7a
OpenSSL version: OpenSSL 0.9.8zh 14 Jan 2016

Comment by Shane Harvey [ 12/Mar/18 ]

In the search for a repro without using MO I believe I've found the issue. Mongos cannot connect when started with the --ipv6 flag.
Simple repro:

  1. Start a config server replica set:

    $ ./mongodb-osx-x86_64-enterprise-3.7.2-416-g421c1e8/bin/mongod --oplogSize=100 --bind_ip=localhost --port=1026 --configsvr --dbpath=data-config --logpath=data-config/mongod.log --replSet=configrs &
    $ mongo --port 1026 --eval 'rs.initiate({_id: "configrs", members: [{_id:0, host:"localhost:1026"}]});'
    

  2. Start a mongos with --ipv6:

    $ mongodb-osx-x86_64-enterprise-3.7.2-416-g421c1e8/bin/mongos --bind_ip=localhost --port=27017 --configdb=configrs/localhost:1026 --ipv6
    ...
    2018-03-12T14:01:48.958-0700 W NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Unable to reach primary for set configrs
    2018-03-12T14:01:48.958-0700 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Cannot reach any nodes for set configrs. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
    2018-03-12T14:01:49.464-0700 W NETWORK  [monitoring keys for HMAC] Unable to reach primary for set configrs
    2018-03-12T14:01:49.465-0700 I NETWORK  [monitoring keys for HMAC] Cannot reach any nodes for set configrs. Please check network connectivity and the status of the set. This has h
    ...
    

Comment by Mark Benvenuto [ 12/Mar/18 ]

Do you have a repro without mongo-orchestration?

If the best way is use mongo-orchestration:

  1. What version of mongo-orchestration are needed?
  2. What switches are needed?
  3. What config file is needed?
  4. Does this only repro on Windows?
Generated at Thu Feb 08 04:34:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.