[SERVER-22910] mongos keeps bad connections around to downed hosts Created: 01/Mar/16  Updated: 06/Dec/22  Resolved: 16/Feb/18

Status: Closed
Project: Core Server
Component/s: Networking, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adam Midvidy Assignee: Backlog - Service Architecture
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bad_connections_downed_shard.js    
Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

Due to the split between the legacy connection pool, and the various NetworkInterfaceASIO connection pools, information about network errors is not fully exploited.

Consider the following scenario:

In a sharded cluster, one shard is restarted. A client runs a find command against mongos, which fails as a bad connection is used. Mongos then correctly dumps all the connections it has to the shard. The client retries the find and it works.

However, if the client then runs a 'count command' it will then fail, since bad connections to the downed shard are still present in the legacy connection pool.

The fix here is to drop all pooled connections to a bad host in ALL pools when a network error is detected.

I have also attached a jstest that reproduces the problem.



 Comments   
Comment by Mira Carey [ 16/Feb/18 ]

The new connection pool does drop connections to hosts we haven't talked to (in a while) and to downed hosts.

As most user facing functionality has migrated to the TaskExec framework, this is gone away (or will continue to go away with further sharding migration)

Comment by Adam Midvidy [ 01/Mar/16 ]

The new connection pool does, I'm not sure about the legacy pool.

Comment by Scott Hernandez (Inactive) [ 01/Mar/16 ]

Do the pools reap the dead connections periodically? And if so, would this just speed up that clean up?

Generated at Thu Feb 08 04:01:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.