[SERVER-16691] Creating a new connection in pymongo takes 15 seconds when the first config server is in the TCP blackhole Created: 29/Dec/14  Updated: 20/Nov/15  Resolved: 20/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding, Stability
Affects Version/s: 2.8.0-rc4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Max Hirschhorn
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-17617 One config server being down can bloc... Closed
Operating System: ALL
Sprint: QuInt B (11/02/15), TIG B (10/30/15), QuInt C (11/23/15)
Participants:

 Description   
  1. start a new cluster with 3 config servers and one shard
  2. create a new sharded collection
  3. do some finds - all are fast (less than a second)
  4. block the first config server with iptables --DROP
  5. create a new connection

Looks like the python driver sends the "isMaster" command as the first op on the new connection, and it tries to do something funny:

2014-12-29T22:09:38.989+0000 I NETWORK  [mongosMain] connection accepted from 192.168.0.1:53628 #3 (2 connections now open)
2014-12-29T22:09:38.990+0000 I NETWORK  [conn3] SyncClusterConnection connecting to [mongo_CFG1:27017]
2014-12-29T22:09:43.990+0000 W NETWORK  [conn3] Failed to connect to 192.168.0.97:27017 after 5000 milliseconds, giving up.
2014-12-29T22:09:43.990+0000 I NETWORK  [conn3] SyncClusterConnection connect fail to: mongo_CFG1:27017 errmsg: couldn't connect to server mongo_CFG1:27017 (192.168.0.97), connection attempt failed
2014-12-29T22:09:43.990+0000 I NETWORK  [conn3] SyncClusterConnection connecting to [mongo_CFG2:27017]
2014-12-29T22:09:43.991+0000 D NETWORK  [conn3] connected to server mongo_CFG2:27017 (192.168.0.98)
2014-12-29T22:09:43.991+0000 I NETWORK  [conn3] SyncClusterConnection connecting to [mongo_CFG3:27017]
2014-12-29T22:09:43.991+0000 D NETWORK  [conn3] connected to server mongo_CFG3:27017 (192.168.0.99)
2014-12-29T22:09:43.991+0000 I NETWORK  [conn3] unable to set SO_RCVTIMEO
2014-12-29T22:09:43.992+0000 I NETWORK  [conn3] trying reconnect to mongo_CFG1:27017 (192.168.0.97) failed
2014-12-29T22:09:48.993+0000 W NETWORK  [conn3] Failed to connect to 192.168.0.97:27017 after 5000 milliseconds, giving up.
2014-12-29T22:09:48.993+0000 I NETWORK  [conn3] reconnect mongo_CFG1:27017 (192.168.0.97) failed failed couldn't connect to server mongo_CFG1:27017 (192.168.0.97), connection attempt failed
2014-12-29T22:09:48.993+0000 I NETWORK  [conn3] query on config.databases: { _id: "admin" } failed to: mongo_CFG1:27017 (192.168.0.97) failed exception: socket exception [CONNECT_ERROR] for mongo_CFG1:27017 (192.168.0.97) failed
2014-12-29T22:09:48.993+0000 D SHARDING [conn3] DBConfig unserialize: admin { _id: "admin", partitioned: false, primary: "config" }
2014-12-29T22:09:48.995+0000 I NETWORK  [conn3] trying reconnect to mongo_CFG1:27017 (192.168.0.97) failed
2014-12-29T22:09:53.995+0000 W NETWORK  [conn3] Failed to connect to 192.168.0.97:27017 after 5000 milliseconds, giving up.
2014-12-29T22:09:53.995+0000 I NETWORK  [conn3] reconnect mongo_CFG1:27017 (192.168.0.97) failed failed couldn't connect to server mongo_CFG1:27017 (192.168.0.97), connection attempt failed
2014-12-29T22:09:53.995+0000 I NETWORK  [conn3] query on config.collections: { _id: /^admin\./ } failed to: mongo_CFG1:27017 (192.168.0.97) failed exception: socket exception [CONNECT_ERROR] for mongo_CFG1:27017 (192.168.0.97) failed
2014-12-29T22:09:53.996+0000 D SHARDING [conn3] found 0 dropped collections and 0 sharded collections for database admin



 Comments   
Comment by Max Hirschhorn [ 20/Nov/15 ]

I tried to reproduce this issue using the steps described but was unsuccessful. Establishing a new connection to mongos isn't expected to automatically cause mongos to connect to the config servers. I received clarification from Bernie on PyMongo's behavior when establishing a new connection: PyMongo will send an isMaster request to the mongos and do authentication on the connection for all credentials specified on the MongoClient. It's unclear if the sharded cluster for which this issue was reported had authentication enabled or not. Note that SERVER-17617 would cover a similar situation if it was authentication on the connection that triggered the mongos to try and read from the admin database.

Closing as "Cannot Reproduce".

Generated at Thu Feb 08 03:41:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.