[SERVER-14781] ReadPreference.secondaryPreferred doesn't failover if secondary in "recovering state" Created: 04/Aug/14  Updated: 04/Aug/14  Resolved: 04/Aug/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.10
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Alex Piggott Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-9788 mongos does not re-evaluate read pref... Closed
Operating System: ALL
Steps To Reproduce:

see above

Participants:

 Description   

I use the Java driver to query the database like this:

docdCursor = docDb.find(query, fields).batchSize(nFromServerLimit).setReadPreference(ReadPreference.secondaryPreferred());

In one particular cluster, I have a sharded DB with 1 shard of 2 replicas and an arbiter (see rs.status at the bottom) - the secondary is in state RECOVERING

I would therefore expect to see my reads all go to the primary. In fact they were being routed to the secondary resulting in errors returning from the above reads (referencing the fact that the queried node was recovering)

When I changed my config so that all reads went to the primary, the problems disappeared

        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.8.106:27018",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 133328,
                        "optime" : Timestamp(1407176295, 181),
                        "optimeDate" : ISODate("2014-08-04T18:18:15Z"),
                        "lastHeartbeat" : ISODate("2014-08-04T18:21:43Z"),
                        "lastHeartbeatRecv" : ISODate("2014-08-04T18:21:43Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 1,
                        "name" : "192.168.8.108:27018",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 1729948,
                        "optime" : Timestamp(1406690192, 40),
                        "optimeDate" : ISODate("2014-07-30T03:16:32Z"),
                        "maintenanceMode" : -16,
                        "errmsg" : "still syncing, not yet to minValid optime 53db9bc5:34",
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.8.106:27218",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 133328,
                        "lastHeartbeat" : ISODate("2014-08-04T18:21:42Z"),
                        "lastHeartbeatRecv" : ISODate("2014-08-04T18:21:43Z"),
                        "pingMs" : 0
                }
        ],



 Comments   
Comment by Thomas Rueckstiess [ 04/Aug/14 ]

Closing this issue as duplicate of SERVER-9788.

Comment by Alex Piggott [ 04/Aug/14 ]

ah perfect - thanks, I'm about to upgrade everyone to 2.6.x so I'll just hold off until 2.6.4 is ready.

Apologies for the poor googling skills in not finding 9788 without having to bother you (I found it first time when I just tried again)

(re: too stale - yes, this cluster has some problems so it's not the first time I've had to resync .. I just mentioned it in case you wanted me to keep it in the errored state for debugging, but obviously not since it's already fixed!)

Comment by Asya Kamsky [ 04/Aug/14 ]

Alex,

The secondaryPreferred issue turns out to be a duplicate of SERVER-9788 which has been fixed for 2.6.4 (which has a release candidate available for testing and should be released very soon).

The cluster being too stale to catch up - this requires manual intervention - you have to fully resync the secondary using procedure described here.

Asya

Comment by Alex Piggott [ 04/Aug/14 ]

"Can you confirm that you are connecting through mongos here? If the driver was connecting directly to the replica set, then this report should be moved to the Java project. However, I'm assuming that you are connecting through mongos and mongos is connected to the shards as replica sets, in which case this would be a mongos issue."

I am connecting through a mongos (version 2.4.10 also)

Comment by Alex Piggott [ 04/Aug/14 ]

Java driver is 2.10, in case that makes a difference

The cluster is still in the errored state (in fact rather worryingly, its optimeDate appears to be static?! ) so if there are any quick tests you want me to run, let me know

oh it's just too stale to catch up - so let me know if you need the cluster in this state, else I'll kick off a resync...

Comment by Asya Kamsky [ 04/Aug/14 ]

Hi Alex,

Sorry, it looks like I misread the original report.

Can you confirm that you are connecting through mongos here? If the driver was connecting directly to the replica set, then this report should be moved to the Java project. However, I'm assuming that you are connecting through mongos and mongos is connected to the shards as replica sets, in which case this would be a mongos issue.

Are all the components in your environment 2.4.10 (shards, mongos)?

Asya

Comment by Asya Kamsky [ 04/Aug/14 ]

Reopening due to misreading the original readPreference used.

Comment by Alex Piggott [ 04/Aug/14 ]

UPDATE: I just saw that you misread the code snippet and thought I was using secondary, when actually I am using secondaryPreferred?

So I think this needs to be re-opened and you can ignore the rest of this comment, which only applies if you did see I was using secondaryPreferred already

[old comment:
Oh ok thanks for the quick update... it's a pretty strange use case that:

  • one primary, no secondary => "secondaryPreferred" always reads from primary
  • one primary, one failed secondary => "secondaryPreferred" errors out

seems like those 2 cases should logically be equivalent, no?

Is there really no read preference that gives me:

  • read from the secondary unless it's not available, in which case read from the primary

You should probably also update the documentation, eg from http://docs.mongodb.org/manual/reference/read-preference/:

1] "In most situations, operations read from secondary members, but in situations where the set consists of a single primary (and no other members), the read operation will use the set’s primary."

... should clarify that it will error if there are no available secondaries, a natural interpretation would be that "no other members" meant "no other available members"

2] "To shift read load from the primary, use mode secondary. Although secondaryPreferred is tempting for this use case, it carries some risk: if all secondaries are unavailable and your set has enough arbiters to prevent the primary from stepping down, then the primary will receive all traffic from clients."

...This very clearly implies that if there are unavailable secondaries then the reads will be redirected to the primary, no?!
]

Comment by Asya Kamsky [ 04/Aug/14 ]

EDIT disregard comment - applies to "secondary" when "secondaryPreferred" is being used.

Alex, this is how secondary is designed to work.
Secondary read preference means only read from secondaries, and if one is not available then it should return an error.
You can see the available modes here:
http://docs.mongodb.org/manual/core/read-preference/#read-preference-modes
I'm guessing based on your description you expected behavior described by secondaryPreferred readPreference.

Generated at Thu Feb 08 03:35:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.