Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Critical - P2
Fix Version/s: 2.6.7, 2.8.0-rc4
Affects Version/s: 2.7.8
Component/s: Sharding, Stability
Labels:
None

Backwards Compatibility:
Fully Compatible
Backport Completed:

2.6.7
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the source code we have a check to bypass the shard version check when the primary connection is in the bad state. However, it appears to be incomplete.

The state of the connection will not be recognized as bad until the OS figures it out (e.g. with keepalive), or until we try to use it. Hence it's possible to encounter a scenario when primary is unreachable, but the connection is still recognized as valid, so we will try to check the shard version, which triggers the replicasetrefresh, and we will encounter the 5 sec timeout there on each and every query. Plus we can also get delayed by other threads who are doing the refresh, as it's a serialized action.

If we already know that the primary server is down (i.e. the monitor knows), we probably can avoid checking the shard version if possible.

is related to

SERVER-16693 It is possible to read unowned data from the primary after fail-over

Closed

Assignee:: Spencer Brody (Inactive)
Reporter:: Alexander Komyagin (Inactive)
Participants:: Alexander Komyagin, Githook User, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Nov 19 2014 05:22:34 PM UTC
Updated:: Jun 19 2015 05:28:25 PM UTC
Resolved:: Dec 22 2014 11:27:42 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates