[SERVER-16237] Don't check the shard version if the primary server is down Created: 19/Nov/14  Updated: 19/Jun/15  Resolved: 22/Dec/14

Status: Closed
Project: Core Server
Component/s: Sharding, Stability
Affects Version/s: 2.7.8
Fix Version/s: 2.6.7, 2.8.0-rc4

Type: Task Priority: Critical - P2
Reporter: Alexander Komyagin Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-16693 It is possible to read unowned data f... Closed
Tested
Backwards Compatibility: Fully Compatible
Backport Completed:
Participants:

 Description   

In the source code we have a check to bypass the shard version check when the primary connection is in the bad state. However, it appears to be incomplete.

The state of the connection will not be recognized as bad until the OS figures it out (e.g. with keepalive), or until we try to use it. Hence it's possible to encounter a scenario when primary is unreachable, but the connection is still recognized as valid, so we will try to check the shard version, which triggers the replicasetrefresh, and we will encounter the 5 sec timeout there on each and every query. Plus we can also get delayed by other threads who are doing the refresh, as it's a serialized action.

If we already know that the primary server is down (i.e. the monitor knows), we probably can avoid checking the shard version if possible.



 Comments   
Comment by Githook User [ 06/Jan/15 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-16237 Don't check shard version if the replica set monitor knows the primary is down
Branch: v2.6
https://github.com/mongodb/mongo/commit/357cf5e9029db85ce36cd6c7ef181edcc142f493

Comment by Alexander Komyagin [ 23/Dec/14 ]

Results of testing Spencer's fix:

MongoS version 2.8.0-rc4-pre- starting: pid=19905 port=27017 64-bit host=ip-10-45-3-116 (--help for usage)
_DEBUG build
git version: b0459b8ef0d506501cc0a9ae062cd788f77c02ce

It appears to have done the trick:

  • an existing connection that reads with primaryPreferred, after blocking the primary and sleeping 30 sec, was able to read without any delays
  • a new connection that reads with primaryPreferred, after blocking the primary and sleeping 30 sec, was able to read without any delays
Comment by Githook User [ 22/Dec/14 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-16237 Don't check shard version if the replica set monitor knows the primary is down
Branch: master
https://github.com/mongodb/mongo/commit/0ba73576bbe465097c825ba946f561c267465a88

Generated at Thu Feb 08 03:40:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.