[SERVER-20775] cluster not reachable while only one (of three) configserver was down Created: 06/Oct/15  Updated: 24/Feb/16  Resolved: 10/Feb/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.4, 2.6.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Ramon Fernandez Marina
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We have one cluster consisting of 5 shards, each consisting of 3 physical replset members. 3 configservers and 3 routers (mongos) are running on 3 different VM's, called sx350, sx351, sx352. We have also 3 other VM's, called offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03 where we have installed 3 other router (mongos).
One VM (sx352) went down at 7 o'clock, so its configserver and router crashed down as well.

The problem is that no connections through mongos on offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03 were possible until sx352 went back round about 20 minutes later after it had crashed down!

While sx352 was down, the mongoshell waited so long to connect (using auth) that I closed it before it came back. Without using --user and --password, the mongoshell could connect quickly but as soon as I entered db.auth("admin", "XXX"), the mongoshell blocked so I closed it after a few seconds.

Do you know why one crashed configserver is able to compromise the access to the cluster through mongos, running on a different VM's, and how one can avoid this issue?
Thanks!



 Comments   
Comment by Kay Agahd [ 24/Feb/16 ]

alexbool, your case seems to be different because "you still can make queries through mongo console and succeed" which was not possible during my case for which I've opened this ticket.

Comment by Alexander Bulaev [ 24/Feb/16 ]

This used to reproduce in our production and testing environments.
You have to turn off first mentioned in your configuration config server to reproduce this.
When reproduced, it only crashes in java driver, you still can make queries through mongo console and succeed.
As for now, we didn't evaluate mongo 3.2.

Comment by Ramon Fernandez Marina [ 10/Feb/16 ]

kay.agahd@idealo.de, it just occurred to me that if the VM that went down was running the first config server this could also be SERVER-21531.

MongoDB 3.2 includes support for Config Servers as a Replica Set, which is a big improvement over mirrored config servers. If you continue to experience reachability issues in your cluster because of config server unavailability I'd recommend you test MongoDB 3.2.

Unfortunately I was not able to reproduce this specific behavior, so I'm going to resolve this ticket for now. If you find a reliable way to reproduce please post a comment here and we can reopen the ticket for further investigation.

Regards,
Ramón.

Comment by Kay Agahd [ 10/Nov/15 ]

Hi ramon.fernandez, sure, it's very dificult to reproduce. I tried it also in vain multiple times. However I wanted to let know you the issue.
SERVER-17617 seems to be different because operations were blocked only for some seconds. Let me know if I can help you further.

Comment by Ramon Fernandez Marina [ 09/Nov/15 ]

kay.agahd@idealo.de, I was not able to reproduce this behavior using a trivial setup (1 shard, 1 mongos, 3 config servers). That being said, the long delay could be related to SERVER-17617.

Comment by Kay Agahd [ 06/Oct/15 ]

The logs of the configserver still running on sx350 shows that user "admin" seemed to be connected during the time frame of the crashed sx352 server:

2015-10-06T07:06:37.885+0200 [initandlisten] connection accepted from 172.16.64.36:56921 #2601606 (57 connections now open)
2015-10-06T07:06:37.891+0200 [conn2601606]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:06:37.901+0200 [conn2601606] end connection 172.16.64.36:56921 (56 connections now open)
2015-10-06T07:06:37.937+0200 [initandlisten] connection accepted from 172.16.64.36:56924 #2601607 (57 connections now open)
2015-10-06T07:06:37.942+0200 [conn2601607]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:06:37.949+0200 [conn2601607] end connection 172.16.64.36:56924 (56 connections now open)
2015-10-06T07:06:37.994+0200 [conn2601603]  authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
2015-10-06T07:06:39.165+0200 [conn2601596] end connection 172.16.66.187:58349 (55 connections now open)
2015-10-06T07:06:39.745+0200 [conn2601604]  authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
2015-10-06T07:06:39.745+0200 [conn2601605]  authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
2015-10-06T07:06:40.568+0200 [conn2601601] end connection 172.16.70.6:53011 (54 connections now open)
2015-10-06T07:06:40.571+0200 [conn2601602] end connection 172.16.70.6:53012 (53 connections now open)
2015-10-06T07:06:40.584+0200 [initandlisten] connection accepted from 172.16.70.6:53019 #2601608 (54 connections now open)
2015-10-06T07:06:40.999+0200 [conn2601603] end connection 172.16.70.5:45467 (53 connections now open)
2015-10-06T07:06:41.018+0200 [initandlisten] connection accepted from 172.16.70.5:45471 #2601609 (54 connections now open)
2015-10-06T07:06:41.179+0200 [initandlisten] connection accepted from 172.16.64.36:56984 #2601610 (55 connections now open)
2015-10-06T07:06:41.184+0200 [conn2601610]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:06:41.192+0200 [conn2601610] end connection 172.16.64.36:56984 (54 connections now open)
2015-10-06T07:06:41.385+0200 [initandlisten] connection accepted from 172.16.67.198:34129 #2601611 (55 connections now open)

Other applications, written in Java, couldn't connect either to none of the three routers running on offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03. Their stacktrace is as follows:

2015-10-06 07:07:00,035 [cluster-ClusterId{value='561356e02b87182d9d5dd57f', description='null'}-offerstore-en-router-01.ipx:27017] INFO  o.mongodb.driver.cluster - Exception in monitor thread while connecting to server offerstore-en-router-01.ipx:27017
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
        at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:474) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:225) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveReply(CommandHelper.java:134) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveCommandResult(CommandHelper.java:121) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.executeCommand(CommandHelper.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.NativeAuthenticator.authenticate(NativeAuthenticator.java:46) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultAuthenticator.authenticate(DefaultAuthenticator.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.authenticateAll(InternalStreamConnectionInitializer.java:99) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:44) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:115) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:127) ~[mongo-java-driver-3.0.1.jar:na]
        at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_25]
        at com.mongodb.connection.SocketStream.read(SocketStream.java:85) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221) ~[mongo-java-driver-3.0.1.jar:na]
        ... 10 common frames omitted
2015-10-06 07:07:30,035 [cluster-ClusterId{value='561356fe2b87182d9d5dd595', description='null'}-offerstore-en-router-02.ipx:27017] INFO  o.mongodb.driver.cluster - Exception in monitor thread while connecting to server offerstore-en-router-02.ipx:27017
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
        at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:474) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:225) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveReply(CommandHelper.java:134) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveCommandResult(CommandHelper.java:121) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.executeCommand(CommandHelper.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.NativeAuthenticator.authenticate(NativeAuthenticator.java:46) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultAuthenticator.authenticate(DefaultAuthenticator.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.authenticateAll(InternalStreamConnectionInitializer.java:99) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:44) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:115) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:127) ~[mongo-java-driver-3.0.1.jar:na]
        at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_25]
        at com.mongodb.connection.SocketStream.read(SocketStream.java:85) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221) ~[mongo-java-driver-3.0.1.jar:na]
        ... 10 common frames omitted
2015-10-06 07:08:00,062 [cluster-ClusterId{value='5613571c2b87182d9d5dd599', description='null'}-offerstore-en-router-03.ipx:27017] INFO  o.mongodb.driver.cluster - Exception in monitor thread while connecting to server offerstore-en-router-03.ipx:27017
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
        at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:474) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:225) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveReply(CommandHelper.java:134) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.receiveCommandResult(CommandHelper.java:121) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.CommandHelper.executeCommand(CommandHelper.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.NativeAuthenticator.authenticate(NativeAuthenticator.java:46) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultAuthenticator.authenticate(DefaultAuthenticator.java:32) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.authenticateAll(InternalStreamConnectionInitializer.java:99) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:44) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:115) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:127) ~[mongo-java-driver-3.0.1.jar:na]
        at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_25]
        at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_25]
        at com.mongodb.connection.SocketStream.read(SocketStream.java:85) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491) ~[mongo-java-driver-3.0.1.jar:na]
        at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221) ~[mongo-java-driver-3.0.1.jar:na]
        ... 10 common frames omitted

Here is the mongod and mongos version of offerstore-en-router-03:

[12:16:55]root@offerstore-en-router-03.ipx /home/admin# mongod --version
db version v2.6.10
2015-10-06T12:17:35.340+0200 git version: 5901dbfb49d16eaef6f2c2c50fba534d23ac7f6c
[12:17:35]root@offerstore-en-router-03.ipx /home/admin# mongos --version
MongoS version 2.6.10 starting: pid=31309 port=27017 64-bit host=offerstore-en-router-03 (--help for usage)
git version: 5901dbfb49d16eaef6f2c2c50fba534d23ac7f6c
 
build sys info: Linux build18.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49

Here is the mongod and mongos version of sx352:

[12:17:06]root@sx352.ipx /home/admin# mongod --version
db version v2.6.4
2015-10-06T12:17:45.694+0200 git version: 3a830be0eb92d772aa855ebb711ac91d658ee910
[12:17:45]root@sx352.ipx /home/admin# mongos --version
MongoS version 2.6.4 starting: pid=21643 port=27017 64-bit host=sx352 (--help for usage)
git version: 3a830be0eb92d772aa855ebb711ac91d658ee910
 
build sys info: Linux build7.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49

I tried to reproduce the problem by shutting down the configserver on sx352 again. This time there was no problem to connect through any of the 3 mongos running on offerstore-en-router-01, offerstore-en-router-02 and offerstore-en-router-03.

Comment by Kay Agahd [ 06/Oct/15 ]

Since I tried to connect against mongos running on offerstore-en-router-03, I filtered the log file by "user: admin" and listed everything that belongs to that connectionId:

2015-10-06T07:04:03.216+0200 [conn139078] SyncClusterConnection connecting to [sx350:20019]
2015-10-06T07:04:03.219+0200 [conn139078] SyncClusterConnection connecting to [sx351:20019]
2015-10-06T07:04:03.221+0200 [conn139078] SyncClusterConnection connecting to [sx352:20019]
2015-10-06T07:04:03.752+0200 [conn139078] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:04:03.752+0200 [conn139078] SyncClusterConnection connect fail to: sx352:20019 errmsg: couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:04:03.757+0200 [conn139078] trying reconnect to sx352:20019 (172.16.65.204) failed
2015-10-06T07:04:06.753+0200 [conn139078] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:04:06.753+0200 [conn139078] reconnect sx352:20019 (172.16.65.204) failed failed couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:04:06.757+0200 [conn139078]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:11:20.471+0200 [conn139914]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:12:08.552+0200 [conn139995] SyncClusterConnection connecting to [sx350:20019]
2015-10-06T07:12:08.555+0200 [conn139995] SyncClusterConnection connecting to [sx351:20019]
2015-10-06T07:12:08.557+0200 [conn139995] SyncClusterConnection connecting to [sx352:20019]
2015-10-06T07:12:09.756+0200 [conn139995] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:12:09.756+0200 [conn139995] SyncClusterConnection connect fail to: sx352:20019 errmsg: couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:12:09.760+0200 [conn139995] trying reconnect to sx352:20019 (172.16.65.204) failed
2015-10-06T07:12:12.756+0200 [conn139995] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:12:12.756+0200 [conn139995] reconnect sx352:20019 (172.16.65.204) failed failed couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:12:12.762+0200 [conn139995]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:14:00.046+0200 [conn140185] SyncClusterConnection connecting to [sx350:20019]
2015-10-06T07:14:00.049+0200 [conn140185] SyncClusterConnection connecting to [sx351:20019]
2015-10-06T07:14:00.051+0200 [conn140185] SyncClusterConnection connecting to [sx352:20019]
2015-10-06T07:14:00.756+0200 [conn140185] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:14:00.756+0200 [conn140185] SyncClusterConnection connect fail to: sx352:20019 errmsg: couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:14:00.765+0200 [conn140185] trying reconnect to sx352:20019 (172.16.65.204) failed
2015-10-06T07:14:03.758+0200 [conn140185] warning: Failed to connect to 172.16.65.204:20019, reason: errno:113 No route to host
2015-10-06T07:14:03.758+0200 [conn140185] reconnect sx352:20019 (172.16.65.204) failed failed couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:14:03.765+0200 [conn140185] Unauthorized not authorized on admin to execute command { getLog: "startupWarnings" }
2015-10-06T07:14:10.631+0200 [conn140185] Unauthorized not authorized on admin to execute command { listDatabases: 1.0 }
2015-10-06T07:14:33.744+0200 [conn140185]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:22:29.397+0200 [conn139078] SyncClusterConnection connecting to [sx350:20019]
2015-10-06T07:22:29.405+0200 [conn139078] SyncClusterConnection connecting to [sx351:20019]
2015-10-06T07:22:29.407+0200 [conn139078] SyncClusterConnection connecting to [sx352:20019]
2015-10-06T07:22:29.409+0200 [conn139078] warning: Failed to connect to 172.16.65.204:20019, reason: errno:111 Connection refused
2015-10-06T07:22:29.409+0200 [conn139078] SyncClusterConnection connect fail to: sx352:20019 errmsg: couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:22:29.413+0200 [conn139078] trying reconnect to sx352:20019 (172.16.65.204) failed
2015-10-06T07:22:29.415+0200 [conn139078] warning: Failed to connect to 172.16.65.204:20019, reason: errno:111 Connection refused
2015-10-06T07:22:29.415+0200 [conn139078] reconnect sx352:20019 (172.16.65.204) failed failed couldn't connect to server sx352:20019 (172.16.65.204), connection attempt failed
2015-10-06T07:22:29.417+0200 [conn139078] Failed to authenticate admin@admin with mechanism MONGODB-CR: AuthenticationFailed key mismatch
2015-10-06T07:22:29.417+0200 [conn139078] Exception thrown while processing query op for admin.$cmd :: caused by :: 9001 socket exception [SEND_ERROR] server [127.0.0.1:46466] 
2015-10-06T07:22:29.417+0200 [conn139078] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:46466] 
2015-10-06T07:22:29.827+0200 [conn139995] Exception thrown while processing query op for admin.$cmd :: caused by :: 9001 socket exception [SEND_ERROR] server [127.0.0.1:47020] 
2015-10-06T07:22:29.827+0200 [conn139995] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:47020] 
2015-10-06T07:22:29.842+0200 [conn139914] Exception thrown while processing query op for admin.$cmd :: caused by :: 9001 socket exception [SEND_ERROR] server [172.16.70.7:55647] 
2015-10-06T07:22:29.842+0200 [conn139914] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [172.16.70.7:55647] 
2015-10-06T07:22:29.850+0200 [conn140185] Exception thrown while processing query op for admin.$cmd :: caused by :: 9001 socket exception [SEND_ERROR] server [127.0.0.1:47151] 
2015-10-06T07:22:29.850+0200 [conn140185] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:47151] 
2015-10-06T07:22:42.613+0200 [conn141390]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:22:50.446+0200 [conn141390] end connection 172.16.65.202:49961 (333 connections now open)
2015-10-06T07:23:01.487+0200 [conn141405]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "admin", key: "xxx" }
2015-10-06T07:27:40.114+0200 [conn141534]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:27:40.120+0200 [conn141534] end connection 172.16.64.36:35890 (454 connections now open)
2015-10-06T07:32:42.011+0200 [conn141622]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:32:42.020+0200 [conn141622] end connection 172.16.64.36:40781 (470 connections now open)
2015-10-06T07:37:38.995+0200 [conn141718]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:37:39.002+0200 [conn141718] end connection 172.16.64.36:44904 (463 connections now open)
2015-10-06T07:42:38.816+0200 [conn141773]  authenticate db: admin { authenticate: 1, user: "admin", nonce: "xxx", key: "xxx" }
2015-10-06T07:42:38.823+0200 [conn141773] end connection 172.16.64.36:49846 (478 connections now open)
2015-10-06T07:58:29.407+0200 [conn141405] end connection 172.16.70.7:57023 (481 connections now open)

As you can see, sx350 and sx351 were reachable but any connection to the configserver on sx352:20019 failed. The question is, why it blocked since two other configserver were still reachable.
You see furhtermore that the connections get closed serverside at 07:22 when sx352 came back again (because I had closed the mongoshells long before already).

Generated at Thu Feb 08 03:55:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.