[SERVER-17617] One config server being down can block read operations on config data for seconds Created: 16/Mar/15  Updated: 05/Feb/16  Resolved: 20/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.0-rc4

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-17668 Mongos Fail over did not work as expe... Closed
is duplicated by SERVER-21007 When the first config server's networ... Closed
is duplicated by SERVER-21496 Connection is slowly when the first c... Closed
Related
related to SERVER-16690 All inserts are delayed by 5 sec when... Closed
related to SERVER-16691 Creating a new connection in pymongo ... Closed
related to SERVER-1448 Host sharding config data on a replic... Closed
related to SERVER-11980 Improve user cache invalidation enfor... Closed
is related to SERVER-22486 query to the router never done when t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding A (10/09/15), QuInt B (11/02/15), TIG B (10/30/15), QuInt C (11/23/15)
Participants:

 Description   

See comment on SERVER-11980. User reported that having one config server down meant than when the user cache was invalidated repopulating it would take over 10 seconds.



 Comments   
Comment by Max Hirschhorn [ 20/Nov/15 ]

Fixed by 6c5d292.


Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21293 Add network timeout to the query fetcher

The query fecther currently does not pass timeout and because of this if
the config server is black-holed a command may never complete.
Branch: master
https://github.com/mongodb/mongo/commit/6c5d292463c0230104a4ca14716d8e82ebbcc2aa

Comment by Spencer Brody (Inactive) [ 27/Oct/15 ]

Going to make one more attempt to repro the original issue using SIGSTOP

Comment by Spencer Brody (Inactive) [ 22/Oct/15 ]

I strongly suspect that switching the config servers to be a replica set will resolve this issue, however I am unable to reproduce the original issue (including using 3.0) so it is impossible to confirm for certain.

Comment by Spencer Brody (Inactive) [ 16/Mar/15 ]

Assigning to 3.1 Required to indicate that by the end of 3.1 we at least need to take another look at this and verify if the config server communication refactor we're doing anyway fixes this for free.

Generated at Thu Feb 08 03:45:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.