[SERVER-9916] be smarter about config server retries in non-responsive situations Created: 12/Jun/13  Updated: 10/Dec/14  Resolved: 07/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.5.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Greg Studer Assignee: Unassigned
Resolution: Duplicate Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-11332 Authentication requests delayed if fi... Closed
Participants:

 Description   

In particular failure modes, retries to a failed config server can take several seconds and block queries to secondary and tertiary config servers. When possible, we should be smarter about reading from other config servers when a server is unavailable. This especially impacts authenticated clusters, since authentication data is not cached in mongos, so new authenticated connections are initially slow to respond.

Example:
1. First config server goes down and is unresponsive to the network, but does not reject packets.
2. A new authenticated connection is created to mongos.
3. Mongos tries to read from the first config server, and before the read tries to reconnect. This eventually fails, but not until the several second timeout.
4. Mongos successfully reads from the second config server, but the response time is bad.
5. This continues to happen for future new connections, each new connection waits for the full timeout, despite the fact that the server is still unavailable.



 Comments   
Comment by Greg Studer [ 03/Jul/13 ]

Not necessarily, if the database primary information has to be read from the config server authing to another database may initially be slow (though that information is cached). Also not specific to just authentication, but any operation using the config server.

Comment by Justin Patrin [ 13/Jun/13 ]

This is specific to authenticating to the admin db, correct? Authentication against other databases should work regardless of config server state?

Generated at Thu Feb 08 03:21:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.