[DRIVERS-1969] Ignore read preference for $out/$merge on secondaries if any servers are pre-5.0 Created: 29/Oct/21 Updated: 20/May/22 Resolved: 25/Mar/22 |
|
| Status: | Closed |
| Project: | Drivers |
| Component/s: | CRUD |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Major - P3 |
| Reporter: | Jeffrey Yemin | Assignee: | Jeremy Mikola |
| Resolution: | Done | Votes: | 0 |
| Labels: | size-small, spec-change | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Quarter: | FY22Q4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Downstream Changes Summary: | In mongodb/specifications@747b748, the rules were applying a read preference for aggregations using $out and $merge was changed. Previously, drivers were instructed to attempt using the read preference and only fall back to a primary if a pre-5.0 secondary was selected. This was changed to require drivers to always disregard the read preference if there is any evidence of a pre-5.0 server. If there are either no available servers or all servers are 5.0+ (or load balanced, where we assume 5.0+), drivers can utilize the read preference. Spec tests have not been changed, as this clarification really only changes behavior for mixed version clusters or a replica set consisting of a single pre-5.0 primary, neither of which are tested. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently the specification says:
Defining it precisely in terms of available and eligible servers (the language of the server selection specification) would be more prescriptive and reduce differences in driver implementations. Something like this:
Examples (all with secondary read preference):
|
| Comments |
| Comment by Githook User [ 10/Nov/21 ] |
|
Author: {'name': 'Jeremy Mikola', 'email': 'jmikola@gmail.com', 'username': 'jmikola'}Message:
Only fall back to a primary read preference if there is evidence that the driver is connected to a pre-5.0 server.
|
| Comment by Jeremy Mikola [ 07/Nov/21 ] |
| Comment by Jeremy Mikola [ 03/Nov/21 ] |
Just to close the loop, david.storch and I sorted this out over Slack. He confirmed that mongos read preference behavior wasn't touched for PM-1770, so my earlier question about "what would mongos do" isn't really applicable. It's just going to enforce the $readPreference provided by the driver. |
| Comment by David Storch [ 02/Nov/21 ] |
|
jmikola I'm not sure I follow all the details of this thread, but is there any other help I can still provide here? Jeff's understanding is correct that it is never legal to have a cluster with a 5.0 mongos but a pre-5.0 mongod. In fact, I'm pretty sure that a 5.0 mongos would refuse to connect to a pre-5.0 mongod. |
| Comment by Kaitlin Mahar [ 02/Nov/21 ] |
|
Correct, the Swift driver delegates to mongoc_database_aggregate and mongoc_collection_aggregate for aggregation support. |
| Comment by Jeremy Mikola [ 02/Nov/21 ] |
|
jeff.yemin and I met earlier today to discuss this and came up with the following approach, which I think will more closely align with rstam's table:
This means that a 5.0+ secondary in a cluster containing one or more pre-5.0 servers will never be utilized; however, it also avoids the earlier BC issue where a replica set consisting of a lone pre-5.0 primary would not be used as a fall back and users would instead encounter a server selection failure. As for the implementation, Jeff suggests that this can be implemented within the "Find suitable servers by topology type and operation type" step in the server selection algorithm (step 3 for multi-thread/async, step 6 for single-threaded). There is still a potential issue for drivers downstream of libmongoc (e.g. PHPLIB), which utilize mongoc_client_select_server. That API only takes a for_writes boolean and an optional read preference. An alternative API would need to be introduced that also takes an optional wire version, such that the following logic could be enforced:
Jeff and I discussed how PHP could conceivably work around this without requiring API changes in libmongoc. If we were to stick with the current two-attempt approach (i.e. only fall back to a primary if the first attempt returns a pre-5.0 secondary), there would still be the BC break for a RS consisting of a lone pre-5.0 primary. I'll continue to give that some thought, but I'd very much like to avoid an API change in libmongoc just for this feature. I don't believe Swift is affected, as they may delegate to mongoc_collection_aggregate (or the equivalent database method) instead of handling server selection directly as PHPLIB does (cc: kaitlin.mahar). |
| Comment by Jeremy Mikola [ 01/Nov/21 ] |
|
david.storch: Given the complexities of server selection discussed above, can you clarify how mongos handles server selection for executing $out/$merge pipelines on secondaries? Note that for a pre-5.0 mongos, the driver would always send RP(primary) to mongos, which isn't controversial. I'm specifically curious about the case where the driver forwards a read preference to a 5.0+ mongos and the shard(s) have any of the aforementioned set of servers. Edit: jeff.yemin pointed out that this may not be an issue for mongos since the upgrade process requires that shards be upgraded before mongos. Given that, any 5.0+ mongos potentially receiving a non-primary read preference could be expected to only have 5.0+ nodes in its shard(s). I was specifically curious about mongos' behavior when a shard is a replica set consisting of only a single primary. I suppose in this case, a 5.0+ mongos could always honor the read preference either way. So RP(secondary) would simply result in a server selection failure and there's never a question of whether mongos would decide to fall back to the lone primary. And the aforementioned case of there only being a pre-5.0 primary isn't applicable, since that would never exist behind a 5.0+ mongos (per the upgrade order). |
| Comment by Jeremy Mikola [ 01/Nov/21 ] |
|
I'll start by addressing examples in the issue description:
The existing text in Read preferences and server selection (from DRIVERS-823) instructs drivers to first apply any explicit/inherited read preference and, iff a pre-5.0 secondary is selected, fall back to a primary. With respect to the examples above, the fourth example should result in the 5.0 secondary being utilized. My current interpretation of the spec with respect to the table in the previous comment is as follows:
I've bolded the lines where my interpretation differs from Robert's table. I realize this leads to more non-deterministic cases for mixed-version clusters; however, my intention was to allow drivers more flexibility given their server selection implementations. I did not want to require drivers to modify their server selection algorithm just for this feature. Needing to consider whether "all available servers are 5.0+" would require more invasive changes. As-is, drivers using the one-attempt approach can use their existing algorithm and only fall back to a primary (bypassing a full second attempt) if a pre-5.0 secondary is selected. The two-attempt approach is not much different, but would be necessary for drivers that can't easily bypass selection to do the fall back (e.g. PHP atop libmongoc). The "[potential] behavioral BC break" scenario for Row 6 (P4, S-none) may need to be addressed regardless, as I didn't consider that in DRIVERS-823. But I'm not sure how we could easily handle that without invasive server selection changes to determine versions across all available servers in the topology. Without doing so, a (P4, S-none) cluster would be hard to distinguish from a (P4, S5) cluster where the secondaries are inaccessible. |
| Comment by Robert Stam [ 29/Oct/21 ] |
|
I'm trying to understand with precision how to select a server for $out (on replica sets) which has been confounding me. To make sure I properly understand exactly how server selection should work in every possible scenario I've prepared the following table to describe my interpretation. the first two columns represent different cluster configurations
Rows 4-11 are all the "fallback to primary" scenarios Rows 13-15 are all the "apply the read preference" scenarios In English I would summarize this as:
|