[DRIVERS-1969] Ignore read preference for $out/$merge on secondaries if any servers are pre-5.0 Created: 29/Oct/21  Updated: 20/May/22  Resolved: 25/Mar/22

Status: Closed
Project: Drivers
Component/s: CRUD
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Jeremy Mikola
Resolution: Done Votes: 0
Labels: size-small, spec-change
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2021-10-29-16-41-37-013.png    
Issue Links:
Depends
is depended on by PYTHON-2554 Support $merge and $out executing on ... Closed
Issue split
split to PHPLIB-760 Ignore read preference for $out/$merg... Closed
split to CDRIVER-4224 Ignore read preference for $out/$merg... Closed
split to CSHARP-3957 Ignore read preference for $out/$merg... Closed
split to CXX-2414 Ignore read preference for $out/$merg... Closed
split to GODRIVER-2220 Ignore read preference for $out/$merg... Closed
split to JAVA-4396 Ignore read preference for $out/$merg... Closed
split to MOTOR-859 Ignore read preference for $out/$merg... Closed
split to NODE-3752 Ignore read preference for $out/$merg... Closed
split to PYTHON-3012 Ignore read preference for $out/$merg... Closed
split to RUBY-2841 Ignore read preference for $out/$merg... Closed
split to RUST-1097 Ignore read preference for $out/$merg... Closed
Related
is related to DRIVERS-823 Support $merge and $out executing on ... Implementing
is related to JAVA-4380 Fix server selection logic for $out/$... Closed
is related to DRIVERS-1955 CRUD spec for $out/$merge on secondar... Closed
Driver Changes: Needed
Quarter: FY22Q4
Downstream Changes Summary:

In mongodb/specifications@747b748, the rules were applying a read preference for aggregations using $out and $merge was changed. Previously, drivers were instructed to attempt using the read preference and only fall back to a primary if a pre-5.0 secondary was selected.

This was changed to require drivers to always disregard the read preference if there is any evidence of a pre-5.0 server. If there are either no available servers or all servers are 5.0+ (or load balanced, where we assume 5.0+), drivers can utilize the read preference.

Spec tests have not been changed, as this clarification really only changes behavior for mixed version clusters or a replica set consisting of a single pre-5.0 primary, neither of which are tested.

Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-4224 Done 1.21.0
CXX-2414 Works as Designed
CSHARP-3957 Done 2.15.0
GODRIVER-2220 Done 1.8.0
JAVA-4396 Duplicate
NODE-3752 Works as Designed
PHPLIB-760 Fixed 1.10.1
PYTHON-3012 Duplicate
MOTOR-859 Duplicate
RUBY-2841 Fixed 2.17.0
RUST-1097 Duplicate
SWIFT-1412 Duplicate

 Description   

Currently the specification says:

- If an explicit (i.e. per-operation) read preference is specified for an
  aggregation with a write stage, drivers MUST attempt to use it. If that would
  result in a pre-5.0, secondary server being selected, drivers MUST instead
  select a server using a primary read preference.
- If no explicit read preference is specified but a default read preference is
  available to inherit (e.g. from the Collection), drivers MUST attempt to use
  it. If that would result in a pre-5.0, secondary server being selected,
  drivers MUST instead select a server using a primary read preference.

Defining it precisely in terms of available and eligible servers (the language of the server selection specification) would be more prescriptive and reduce differences in driver implementations. Something like this:

If an explicit (i.e. per-operation) or default (i.e., specified on a collection object) 
read preference is specified for an aggregation with a write stage, 
drivers use this logic to determine the list of eligible servers to consider:
 
If there are any *available* pre-5.0 servers, drivers MUST fall back to using primary 
read preference to determine the list of *eligible* servers.  
Otherwise, drivers MUST apply the read preference to determine the list of *eligible* servers.

Examples (all with secondary read preference):

  1. a 5.0 primary and no secondaries -> apply read preference (no eligible servers)
  2. a 5.0 primary and a 5.0 secondary -> apply read preference (one eligible server)
  3. no available servers -> apply read preference (no eligible servers)
  4. a 4.4 primary and a 5.0 secondary -> fallback to primary (one eligible server)


 Comments   
Comment by Githook User [ 10/Nov/21 ]

Author:

{'name': 'Jeremy Mikola', 'email': 'jmikola@gmail.com', 'username': 'jmikola'}

Message: DRIVERS-1969: Revise rules for $out/$merge read preferences (#1095)

  • DRIVERS-1969: Revise rules for $out/$merge read preferences

Only fall back to a primary read preference if there is evidence that the driver is connected to a pre-5.0 server.

Comment by Jeremy Mikola [ 07/Nov/21 ]

https://github.com/mongodb/specifications/pull/1095

Comment by Jeremy Mikola [ 03/Nov/21 ]

but is there any other help I can still provide here?

Just to close the loop, david.storch and I sorted this out over Slack. He confirmed that mongos read preference behavior wasn't touched for PM-1770, so my earlier question about "what would mongos do" isn't really applicable. It's just going to enforce the $readPreference provided by the driver.

Comment by David Storch [ 02/Nov/21 ]

jmikola I'm not sure I follow all the details of this thread, but is there any other help I can still provide here? Jeff's understanding is correct that it is never legal to have a cluster with a 5.0 mongos but a pre-5.0 mongod. In fact, I'm pretty sure that a 5.0 mongos would refuse to connect to a pre-5.0 mongod.

Comment by Kaitlin Mahar [ 02/Nov/21 ]

Correct, the Swift driver delegates to mongoc_database_aggregate and mongoc_collection_aggregate for aggregation support.

Comment by Jeremy Mikola [ 02/Nov/21 ]

jeff.yemin and I met earlier today to discuss this and came up with the following approach, which I think will more closely align with rstam's table:

If there are one or more available servers and at least one of those is pre-5.0 (i.e. there is evidence of a pre-5.0 server), do not use the explicit/inherited read preference and instead fall back to using a primary read preference.

If there are no available servers, attempt to use the explicit/inherited read preference.

This means that a 5.0+ secondary in a cluster containing one or more pre-5.0 servers will never be utilized; however, it also avoids the earlier BC issue where a replica set consisting of a lone pre-5.0 primary would not be used as a fall back and users would instead encounter a server selection failure.

As for the implementation, Jeff suggests that this can be implemented within the "Find suitable servers by topology type and operation type" step in the server selection algorithm (step 3 for multi-thread/async, step 6 for single-threaded).


There is still a potential issue for drivers downstream of libmongoc (e.g. PHPLIB), which utilize mongoc_client_select_server. That API only takes a for_writes boolean and an optional read preference. An alternative API would need to be introduced that also takes an optional wire version, such that the following logic could be enforced:

If any servers have wire version < 13, select a primary (i.e. for_writes=true); otherwise, use the provided read preference.

Jeff and I discussed how PHP could conceivably work around this without requiring API changes in libmongoc. If we were to stick with the current two-attempt approach (i.e. only fall back to a primary if the first attempt returns a pre-5.0 secondary), there would still be the BC break for a RS consisting of a lone pre-5.0 primary. I'll continue to give that some thought, but I'd very much like to avoid an API change in libmongoc just for this feature.

I don't believe Swift is affected, as they may delegate to mongoc_collection_aggregate (or the equivalent database method) instead of handling server selection directly as PHPLIB does (cc: kaitlin.mahar).

Comment by Jeremy Mikola [ 01/Nov/21 ]

david.storch: Given the complexities of server selection discussed above, can you clarify how mongos handles server selection for executing $out/$merge pipelines on secondaries? Note that for a pre-5.0 mongos, the driver would always send RP(primary) to mongos, which isn't controversial. I'm specifically curious about the case where the driver forwards a read preference to a 5.0+ mongos and the shard(s) have any of the aforementioned set of servers.

Edit: jeff.yemin pointed out that this may not be an issue for mongos since the upgrade process requires that shards be upgraded before mongos. Given that, any 5.0+ mongos potentially receiving a non-primary read preference could be expected to only have 5.0+ nodes in its shard(s).

I was specifically curious about mongos' behavior when a shard is a replica set consisting of only a single primary. I suppose in this case, a 5.0+ mongos could always honor the read preference either way. So RP(secondary) would simply result in a server selection failure and there's never a question of whether mongos would decide to fall back to the lone primary. And the aforementioned case of there only being a pre-5.0 primary isn't applicable, since that would never exist behind a 5.0+ mongos (per the upgrade order).

Comment by Jeremy Mikola [ 01/Nov/21 ]

I'll start by addressing examples in the issue description:

Examples (all with secondary read preference):

1. a 5.0 primary and no secondaries -> apply read preference (no eligible servers)
2. a 5.0 primary and a 5.0 secondary -> apply read preference (one eligible server)
3. no available servers -> apply read preference (no eligible servers)
4. a 4.4 primary and a 5.0 secondary -> fallback to primary (one eligible server)

The existing text in Read preferences and server selection (from DRIVERS-823) instructs drivers to first apply any explicit/inherited read preference and, iff a pre-5.0 secondary is selected, fall back to a primary.

With respect to the examples above, the fourth example should result in the 5.0 secondary being utilized.


My current interpretation of the spec with respect to the table in the previous comment is as follows:

  • Row 4 (P-none, S-none): no servers are available so any RP results in a server selection failure
  • Row 5 (P-none, S4): neither a primary nor an 5.0+ secondary is available, so any RP results in a server selection failure
  • Row 6 (P4, S-none): P4 will be selected for most RPs; however, RP(secondary) will result in a server selection failure because a pre-5.0 secondary is never selected to warrant fall back behavior. This is potentially a behavioral BC break
  • Row 7 (P4, S4): P4 will always be selected, either directly or as a fall back after S4 is selected
  • Row 8 (P4, S5): P4 will always be selected for RP(primary) or RP(primaryPreferred). S5 will always be selected for RP(secondary) or RP(secondaryPreferred). RP(nearest) is non-deterministic.
  • Row 9 (P4, S4+S5): P4 will always be selected for RP(primary) or RP(primaryPreferred). Other RPs are non-deterministic, since S4 could be selected and warrant falling back to P4.
  • Row 10 (P5, S4): P5 will always be selected, either directly or as a fall back after S4 is selected
  • Row 11 (P5, S4+S5): P5 will always be selected for RP(primary) or RP(primaryPreferred). Other RPs are non-deterministic, since S4 could be selected and warrant falling back to P5.
  • Row 13 (P-none, S5): RP(primary) always results in a server selection failure. For all other RPs, S5 will be selected
  • Row 14 (P5, S-none): RP(secondary) always results in a server selection failure. For all other RPs, P5 will be selected
  • Row 15 (P5, S5): P5 or S5 will be selected according to the RP. Only RP(nearest) is non-deterministic

I've bolded the lines where my interpretation differs from Robert's table.

I realize this leads to more non-deterministic cases for mixed-version clusters; however, my intention was to allow drivers more flexibility given their server selection implementations. I did not want to require drivers to modify their server selection algorithm just for this feature. Needing to consider whether "all available servers are 5.0+" would require more invasive changes.

As-is, drivers using the one-attempt approach can use their existing algorithm and only fall back to a primary (bypassing a full second attempt) if a pre-5.0 secondary is selected. The two-attempt approach is not much different, but would be necessary for drivers that can't easily bypass selection to do the fall back (e.g. PHP atop libmongoc).

The "[potential] behavioral BC break" scenario for Row 6 (P4, S-none) may need to be addressed regardless, as I didn't consider that in DRIVERS-823. But I'm not sure how we could easily handle that without invasive server selection changes to determine versions across all available servers in the topology. Without doing so, a (P4, S-none) cluster would be hard to distinguish from a (P4, S5) cluster where the secondaries are inaccessible.

Comment by Robert Stam [ 29/Oct/21 ]

I'm trying to understand with precision how to select a server for $out (on replica sets) which has been confounding me.

To make sure I properly understand exactly how server selection should work in every possible scenario I've prepared the following table to describe my interpretation.

the first two columns represent different cluster configurations
none means no available server
P4 and P5 mean a 4.x or 5.x primary
S4 and S5 mean a 4.x or 5.x secondary (possibly more than one)
error means no suitable server (server selector will keep trying and eventually timeout)
The cell at the intersection of a cluster configuration and ReadPreference indicates what server(s) will be selected

Rows 4-11 are all the "fallback to primary" scenarios

Rows 13-15 are all the "apply the read preference" scenarios

In English I would summarize this as:

  • if there are any available servers and all available servers are 5.0+ apply the read preference
  • otherwise fallback to primary

 

 

Generated at Thu Feb 08 08:24:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.