-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.3.0
-
Component/s: Internal Client, Sharding
-
None
-
ALL
ISSUE SUMMARY
New sharded connections may fail to connect if any shard has no available primary for an extended period.
This issue is part of 4 related issues which impact cluster availability when there is no primary available for a shard. See SERVER-7246, SERVER-5625, SERVER-11971 and SERVER-12041 for more details.
USER IMPACT
When any replica set in a sharded cluster has no available primary, new connections may fail to perform secondary reads due to an initial heuristic shard version check, or initial authorization check.
It is present in versions of MongoDB prior to and including v2.4.8.
SOLUTION
Ignore failures of initial version check during connection and allow authorization against secondaries (primary is preferred when available).
In v2.4.9 only (this is set by default in v2.6.0 and later), it is necessary to use the following two startup parameters for mongos:
--setParameter ignoreInitialVersionFailure=true --setParameter authOnPrimaryOnly=false
These parameters can also be set on a MongoS after launch with the following commands
db.adminCommand({setParameter:1,ignoreInitialVersionFailure:true}) db.adminCommand({setParameter:1,authOnPrimaryOnly:false})
WORKAROUNDS
There is no direct work around. You should ensure that your replica sets in sharded clusters have enough redundancy. You should ensure you have robust and fault tolerant underlying architectures (network, WAN hosting, etc).
PATCHES
Production release v2.4.9 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.
Original Description
This issue is fixed, but depending on the type of connectivity issue between a mongos and the down primary, connection and query performance can be severely degraded in this scenario.
Results of testing different primary down scenarios latencies:
With killed processes, but functioning network:
First query average is 3 secs
Final average is 2 secs
With iptables DROP:
First query average is 428 sec
Final average is 254 sec
With iptables REJECT:
First query average is 473 sec
Final average is 255 sec
- is duplicated by
-
SERVER-2478 can't start usable mongos if replSet shard lacks a master
- Closed
-
SERVER-6420 use primarypreferred instead of slaveOk to retrieve auth data
- Closed
-
SERVER-7075 Queries fail if no primary server available (primaryPreferred read preference)
- Closed
-
SERVER-8689 jstests/sharding/shard_insert_getlasterror_w2.js is failing
- Closed
-
SERVER-7541 mongos should be able to read from secondaries when there is no master
- Closed
- is related to
-
SERVER-13768 sharded listDatabases command not tolerant of replica sets being down
- Closed
- related to
-
SERVER-5625 New sharded connections to a namespace trigger setShardVersion on all shards
- Closed
-
SERVER-7111 DBClientReplicaSet::connect should not assert if primary is down but secondaries are available
- Closed
-
SERVER-12221 Sleep in ReplicaSetMonitor::_check is causing latency for slaveOk() queries in sharded cluster when there is no primary
- Closed