[SERVER-7246] Mongos cannot do slaveOk queries when primary is down Created: 03/Oct/12  Updated: 11/Jul/16  Resolved: 21/Dec/13

Status: Closed
Project: Core Server
Component/s: Internal Client, Sharding
Affects Version/s: 2.3.0
Fix Version/s: 2.4.9, 2.5.5

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Greg Studer
Resolution: Done Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File slaveOk_no_pri.js     File slaveok_memberchange.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-2478 can't start usable mongos if replSet ... Closed
is duplicated by SERVER-6420 use primarypreferred instead of slave... Closed
is duplicated by SERVER-7075 Queries fail if no primary server ava... Closed
is duplicated by SERVER-8689 jstests/sharding/shard_insert_getlast... Closed
is duplicated by SERVER-7541 mongos should be able to read from se... Closed
Related
related to SERVER-5625 New sharded connections to a namespac... Closed
related to SERVER-7111 DBClientReplicaSet::connect should no... Closed
related to SERVER-12221 Sleep in ReplicaSetMonitor::_check is... Closed
is related to SERVER-13768 sharded listDatabases command not tol... Closed
Operating System: ALL
Participants:

 Description   
Issue Status as of January 8th, 2014

ISSUE SUMMARY
New sharded connections may fail to connect if any shard has no available primary for an extended period.

This issue is part of 4 related issues which impact cluster availability when there is no primary available for a shard. See SERVER-7246, SERVER-5625, SERVER-11971 and SERVER-12041 for more details.

USER IMPACT
When any replica set in a sharded cluster has no available primary, new connections may fail to perform secondary reads due to an initial heuristic shard version check, or initial authorization check.

It is present in versions of MongoDB prior to and including v2.4.8.

SOLUTION
Ignore failures of initial version check during connection and allow authorization against secondaries (primary is preferred when available).

In v2.4.9 only (this is set by default in v2.6.0 and later), it is necessary to use the following two startup parameters for mongos:

--setParameter ignoreInitialVersionFailure=true
--setParameter authOnPrimaryOnly=false

These parameters can also be set on a MongoS after launch with the following commands

db.adminCommand({setParameter:1,ignoreInitialVersionFailure:true})
db.adminCommand({setParameter:1,authOnPrimaryOnly:false})

WORKAROUNDS
There is no direct work around. You should ensure that your replica sets in sharded clusters have enough redundancy. You should ensure you have robust and fault tolerant underlying architectures (network, WAN hosting, etc).

PATCHES
Production release v2.4.9 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

This issue is fixed, but depending on the type of connectivity issue between a mongos and the down primary, connection and query performance can be severely degraded in this scenario.

Results of testing different primary down scenarios latencies:

With killed processes, but functioning network:
First query average is 3 secs
Final average is 2 secs

With iptables DROP:
First query average is 428 sec
Final average is 254 sec

With iptables REJECT:
First query average is 473 sec
Final average is 255 sec



 Comments   
Comment by Asya Kamsky [ 04/Nov/15 ]

This is an old issue that's been fixed for over a year. If your question is not specifically about this issue please ask it on MongoDB-user mailing list.

Comment by Edik Mkoyan [ 04/Nov/15 ]

If I have network star topology, and if I need the rs members in spokes always be in master state then I should not use mongoldb?

Comment by waterbull [ 01/May/14 ]

Greg Studer, Oh, I got it. And test it using commands "use <db>;db.<collection>.find()", it works.
thank you! Greg.

Comment by Greg Studer [ 28/Apr/14 ]

w.b The listDatabases command (which is what "show databases" invokes) isn't a standard query. It uses a different codepath - a primary being down will still prevent it from completing. I suspect 2.4.x has similar behavior here.

Opened SERVER-13768 to track, since this is a different issue.

Comment by waterbull [ 25/Apr/14 ]

I download mongodb.2.6.0 to test high available.
Even after running "sh.slaveOk();" while replication set's primary is down,
"show databases" reports:
"exception: ReplicaSetMonitor no master found for set: shard01" .

then I run the follow command:
db.adminCommand(

{setParameter:1,ignoreInitialVersionFailure:true}

)
db.adminCommand(

{setParameter:1,authOnPrimaryOnly:false}

)

it reports:
"errmsg" : "no option found to set, use help:true to see options "

then, run command:
db.adminCommand(

{setParameter:1,help:true}

);

it reports:
{
"help" : "help for: setParameter set administrative option(s)\n

{ setParameter:1, <param>:<value> }

\nsupported:\n _forceLegacyShardWriteMode\n authSchemaVersion\n clusterAuthMode\n connPoolMaxConnsPerHost\n connPoolMaxShardedConnsPerHost\n enableLocalhostAuthBypass\n enableTestCommands\n internalQueryCacheFeedbacksStored\n internalQueryCacheSize\n internalQueryCacheStdDeviations\n internalQueryCacheWriteOpsBetweenFlush\n internalQueryEnumerationMaxIntersectPerAnd\n internalQueryEnumerationMaxOrSolutions\n internalQueryForceIntersectionPlans\n internalQueryPlanEvaluationCollFraction\n internalQueryPlanEvaluationMaxResults\n internalQueryPlanEvaluationWorks\n internalQueryPlanOrChildrenIndependently\n internalQueryPlannerEnableIndexIntersection\n internalQueryPlannerMaxIndexedSolutions\n logLevel\n logUserIds\n quiet\n releaseConnectionsAfterResponse\n sslMode\n supportCompatibilityFormPrivilegeDocuments\n textSearchEnabled\n userCacheInvalidationIntervalSecs\n verboseQueryLogging\n",
"lockType" : 0,
"ok" : 1
}

no options "ignoreInitialVersionFailure", "authOnPrimaryOnly" found.

Comment by Githook User [ 11/Dec/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-7246 test fixup for shard failure tests
Branch: master
https://github.com/mongodb/mongo/commit/dc7147dc394a10fb8ef2b1aa51f0f9d7593fc95a

Comment by Githook User [ 11/Dec/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-7246 legacy user authentication should use primary preferred read pref
Branch: master
https://github.com/mongodb/mongo/commit/f0ff2b32938f65d6f8625dbdb45cec479c87113d

Comment by Githook User [ 11/Dec/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-7246 rename internal dbclient_rs auth helper
Branch: master
https://github.com/mongodb/mongo/commit/913e08fc92778efadcc0f00d975017f09bf61696

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-7246 rename internal dbclient_rs auth helper
Branch: v2.4
https://github.com/mongodb/mongo/commit/b113ec11a03ff3037d459928df1d492fad59d8eb

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-7246 external user authentication should use primary preferred read pref
Branch: v2.4
https://github.com/mongodb/mongo/commit/3d8faaadab8ed44e30e410565868cac44d35b1c3

Comment by Greg Studer [ 04/Dec/13 ]

Mixup on commit SERVER slug.

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-5625 SERVER-7426 allow all connections in all states to tolerate downed shards and primaries
Branch: master
https://github.com/mongodb/mongo/commit/024a6739ac52ce669bff42b8782484f27a8cbc34

Comment by Spencer Brody (Inactive) [ 27/Aug/13 ]

From a simple test using the shell, this seems to affect the C++ driver as well.

Comment by Randolph Tan [ 23/Aug/13 ]

Attached another alternative procedure to demonstrate this issue.

Comment by Randolph Tan [ 22/Aug/13 ]

Attaching crude test script that demonstrates the issue.

Comment by peanutgyz [ 29/Jan/13 ]

is there any plan to fix this ??

Generated at Thu Feb 08 03:13:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.