[SERVER-9788] mongos does not re-evaluate read preference once a valid replica set member is chosen Created: 28/May/13 Updated: 08/Feb/23 Resolved: 30/Jun/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.3 |
| Fix Version/s: | 2.6.4, 2.7.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Remon van Vliet | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
All |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | 1) Create and start 3 member repset (primary, secondary, arbiter) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Michael Paik [ 11/Nov/14 ] |
|
renctan, is removing this bullet point the only applicable change? |
| Comment by Randolph Tan [ 22/Jul/14 ] |
|
More conservative fix for v2.6: Added a new mongos server parameter, "internalDBClientRSReselectNodePercentage". This can be set to any value from 0 to 100 (defaults to 0) and represents the probability (expressed in percentage) of a replica set connection in mongos to reevaluate replica set node selection from scratch, regardless of the compatibility of the current read preference to the currently pinned node. Extra care should be taken since v2.6 doesn't pool secondary connections, so unpinning a node from the replica set connection has a side effect of destroying the connection. This means in extreme cases (for example, 100%), mongos can be creating and destroying connections for every read request. |
| Comment by Githook User [ 22/Jul/14 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: v2.6 fix: Added server parameter to tweak the frequency of when a replica set connection will decide when it needs to re-evaluate the node selection from scratch for a query with read prefrerence (i.e., decide not to use the cached connection regardless of read prefrerece compatibility). |
| Comment by Remon van Vliet [ 14/Jul/14 ] |
|
Great, that sounds like the appropriate fix. |
| Comment by Randolph Tan [ 30/Jun/14 ] |
|
Changes made: 1. Secondary connections are now drawn from the global pool. |
| Comment by Githook User [ 30/Jun/14 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Scott Hernandez (Inactive) [ 28/Jan/14 ] |
|
The java driver exhibits this behavior because it has a long-lived connection pool, and once the pooled connections map to a backend, it sticks to that one, providing consistency (see below). The goal is not to distribute each individual request/operation to load balance across the available replicas but more to handle distribution of those connection when the sockets/connects are established. This will yield the most consisten view of the data because it will not result in reads from different replicas across the window of replication (so you don't see new data, then old data) thus leading to reads in order other than normal time. Remon, replicas are not really good for good for read-scaling, unfortunately; if you want to scale, read or write, it is best to add more shards, not replicas. There are some exceptions from this, but they are few are far between and related to over-saturating nodes and/or large node latencies. If you have a specific use-case it would be good to provide it here so we can suggest what to do. |
| Comment by Irina Kaprizkina [ 12/Nov/13 ] |
|
We are experiencing the same issue. In our tests it seems to be pointing to java driver not able to utilize restarted available secondary server. |
| Comment by Vinod Kumar [ 28/Oct/13 ] |
|
Hi any updates on this . Seems like we saw the same behaviour in |
| Comment by Remon van Vliet [ 19/Jun/13 ] |
|
Any updates on this? I would like to know what the decisions, if any, are regarding this issue since it might mean we'll have to start working on a workaround. |
| Comment by Remon van Vliet [ 05/Jun/13 ] |
|
1) Ah, yes that pretty much explains it. Probably good to provide a link to that section in the read preferences docs |
| Comment by Randolph Tan [ 31/May/13 ] |
|
1) Although it was not clear in the documentation, the pinning behavior was described in the auto-retry section. |
| Comment by Remon van Vliet [ 30/May/13 ] |
|
I understand but it's rather time consuming to isolate the test as it's currently built on top of some in-house tooling. It's almost certainly the pinning behaviour. I have a test that runs 20 threads that all do random reads from a test collections at maximum throughput. Database configuration is as described. I would argue this is actually a bug rather than a feature request for the following reasons : 1) The contract for secondaryPreferred as described in the documentation is "... read from secondary members, but in situations where the set consists of a single primary (and no other members,) the read operation will use the set’s primary.". Currently it does not adhere to this (it will read from a primary in situations where there ARE other members). 2) The behaviour between connecting to a repset directly compared to through mongos is currently not consistent. Drivers behave correctly (as in, do as advertised) whereas mongos does not. 3) The current behaviour can lead to prolonged significantly degraded read and write throughput while a by then perfectly healthy secondary was available. With sufficiently long cluster uptimes this would almost certainly lead to situations where secondary nodes cannot be counted on to carry read load. Hope you agree. |
| Comment by Randolph Tan [ 29/May/13 ] |
|
I just wanted to make sure that the one you are experiencing is just the pinning behavior or something else. If this is indeed the pinning behavior then I will convert this into a feature request to allow unpinning of connections. |
| Comment by Remon van Vliet [ 29/May/13 ] |
|
I understand the reasoning and it's perfectly valid for various usecases but I think that's a developer decision to make. If they want to avoid that behaviour they should not have a read preference "secondary preferred" which implies the expectation that it will switch from primary to secondary when the latter becomes available. In that case the developer clearly prioritizes removing read load from the primary. I also don't think the back in time issue is that relevant for scenarios where developers have to take eventual consistency into account anyway (it is not different from switching from one secondary to the next if they are at different position in the oplog). I don't have a very practical way to share the entire test unfortunately. Are you not able to reproduce? |
| Comment by Randolph Tan [ 29/May/13 ] |
|
Hi, Can you share the test project? The reasoning behind the pinning logic was to avoid going back in time as much as possible. For example, if doc A was deleted at time T and client is connected to node0 which has optime > T, we want to avoid the situation where client switches to node1 which has optime < T and make doc A visible to it. So the jump from the view of the world at optime > T to optime < T was what I was referring to as "going back in time". |
| Comment by Remon van Vliet [ 29/May/13 ] |
|
Hi, Yes that is the behaviour I'm seeing. I would argue that that is not correct behaviour. Secondary preferred read preference should do exactly that and prefer secondary nodes when they are available. It is currently not following that contract and there are very valid reasons why you would not want the current behaviour. Additionally the behaviour isn't consistent with accessing repsets directly rather than through mongos. My test is multi-threaded by the way. |
| Comment by Randolph Tan [ 28/May/13 ] |
|
Hi, The mongos pins the node chosen unless the node becomes unreachable or the read preference setting becomes incompatible with the selected node. In addition, mongos also uses pooled connection so if your test is single threaded, it is very likely that it is using the same connection from the pool that was pinned. |
| Comment by Remon van Vliet [ 28/May/13 ] |
|
Note that it works perfectly fine if the driver connects directly to the repset rather than through mongos. |