[SERVER-4706] when a socket between mongos and mongod fails, close all connections immediately Created: 18/Jan/12  Updated: 24/May/17  Resolved: 01/Jul/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 2.6.0

Type: Bug Priority: Major - P3
Reporter: Kristina Chodorow (Inactive) Assignee: Randolph Tan
Resolution: Done Votes: 21
Labels: cap-ticket-needed, revisit
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-9041 proactively detect broken connections... Closed
Documented
is documented by DOCS-9039 when a socket between mongos and mong... Closed
Duplicate
is duplicated by SERVER-4997 Mongos not clearing stale connections Closed
Related
related to SERVER-9788 mongos does not re-evaluate read pref... Closed
related to SERVER-7629 Make DBClientReplicaSet draw connecti... Closed
is related to SERVER-9041 proactively detect broken connections... Closed
is related to SERVER-7573 Add tests for network connectivity lo... Closed
Tested
Operating System: ALL
Participants:

 Description   

Currently, mongos closes one connection at a time, even though it can "tell" that every connection to that shard is going to be bad. It could mark them all as failed and so not return as many error to the client.

Leads to Assertion: 13633:error querying server errors (for Google-ability).



 Comments   
Comment by Greg Studer [ 01/Jul/14 ]

This issue has become a bit of a hodgepodge:

Fixed in v2.4 (SERVER-4706, this ticket) - connection pools track socket creation time and dispose of earlier connections once a bad connection is reported
Fixed in v2.6 (SERVER-9041 and write commands) - connection pools check sockets every few seconds before releasing, and operations release all connections back to the pool when completed, allowing the SERVER-4706 fix to have more impact
Fixed in v2.7/8 (SERVER-9788) - replica set secondary connections are pooled

Comment by Randolph Tan [ 08/Jul/13 ]

Hi,

Currently, there is no effective way of clearing the entire pool of bad connections. On the other hand, there is a fix in the master branch (SERVER-9041) that would alleviate this issue by periodically disposing bad connections (does not work in Win XP).

Thanks!

Comment by Tito George [ 04/Jul/13 ]

Is there any recommended workaround for this, Or restarting mongos is the only option? Is there anyway in mongo to remove stale connections. I ran into the below error while doing a fail-over test on one of the replica set. I hope this is the same issue discussed here.

Caused by: com.mongodb.MongoException: DBClientBase::findN: transport error: xx-xx-xx-xx.xxx:27018 ns: admin.$cmd query:

{ setShardVersion: "database.users", configdb: "xxx-xxx-xxx-xxx.xxx:27019,xxx-xxx-xxx-xxx.xxx:27019,xxx-xxx-xxx-xxx.xxx:27019", version: Timestamp 2000|0, versionEpoch: ObjectId('51d55747c0b7226ddb8f342d'), serverID: ObjectId('51d3d8e6c0b7226ddb8efd1d'), shard: "rs0", shardHost:"rs0/xxx-xxx-xxx-xxx.xxx:27018,xxx-xxx-xxx-xxx.xxx:27018,xxx-xxx-xxx-xxx.xxx:27018" }

at com.mongodb.MongoException.parse(MongoException.java:82) ~[mongo-java-driver-2.9.3.jar:na]

Comment by auto [ 18/Dec/12 ]

Author:

{u'date': u'2012-12-12T23:40:50Z', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-4706 when a socket between mongos and mongod fails, close all connections immediately

Additional fix for query/insert.
Branch: master
https://github.com/mongodb/mongo/commit/acc9aea88a7a56694ba6d7f51d22b61dfc2a9353

Comment by auto [ 21/Nov/12 ]

Author:

{u'date': u'2012-11-21T16:54:06Z', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: Fix test for SERVER-4706

gle doesn't work with mongos (SERVER-7739) so use awaitReplication instead to synchronize the test.
Branch: master
https://github.com/mongodb/mongo/commit/5ef99388a9c6687e00bdfe399e7d44e77a8d1a08

Comment by auto [ 19/Nov/12 ]

Author:

{u'date': u'2012-11-19T03:45:03Z', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: SERVER-4706 bbot compile fix for 32bit
Branch: master
https://github.com/mongodb/mongo/commit/63e189c67fd9e8f54fe18f59f4a8e6eb5cee29f5

Comment by auto [ 18/Nov/12 ]

Author:

{u'date': u'2012-11-18T23:15:19Z', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: SERVER-4706 Fix buildbot compile failure on Linux client 64bit build
Branch: master
https://github.com/mongodb/mongo/commit/2232ee5fa337735a3e3c6f1fdbd29fae228724cb

Comment by Randolph Tan [ 17/Nov/12 ]

If SERVER-7629 would use either the ScopedDbConnection or ShardConnection, then it will make the replica set connections inherit this fix.

Comment by auto [ 17/Nov/12 ]

Author:

{u'date': u'2012-11-17T07:19:09Z', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: buildbot compile fix for SERVER-4706 part 2
Branch: master
https://github.com/mongodb/mongo/commit/9b0661f6323a4f48c20d3d341e378a246593375e

Comment by auto [ 17/Nov/12 ]

Author:

{u'date': u'2012-11-17T07:01:26Z', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: buildbot compire error fix for SERVER-4706
Branch: master
https://github.com/mongodb/mongo/commit/054155bb07d721a611f3c241f9b5f07667a72494

Comment by Randolph Tan [ 17/Nov/12 ]

Note: first commit only handles direct connections. Connections to replica sets are special since they encapsulate multiple connections within them and can do auto retry in certain cases (for example, in slaveOk reads).

Comment by auto [ 17/Nov/12 ]

Author:

{u'date': u'2012-11-05T17:05:39Z', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-4706 when a socket between mongos and mongod fails, close all connections immediately
Branch: master
https://github.com/mongodb/mongo/commit/83dbdf1a38b2620701ad85b307ecbb799caa38ba

Comment by Ben Becker [ 06/Aug/12 ]

Hi Mike,

That's correct; this feature was pushed to v2.3.0.

Regards,
Ben

Comment by Michael DelNegro [ 06/Aug/12 ]

Can you please confirm that this is not fixed in the 2.2.X codeline?

Thanks,
Mike

Comment by Ben Becker [ 14/May/12 ]

Hi Ajay,

This one is next on my plate. It will be in the v2.1.2 release, which is currently scheduled for the end of the month.

Regards,
-Ben

Comment by Ajay Batheja [ 14/May/12 ]

Any new update on this?

Generated at Thu Feb 08 03:06:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.