[SERVER-44603] Consider having tailable readPreference "primary" queries killed on stepdown Created: 13/Nov/19 Updated: 07/Apr/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.2.0, 4.3.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Alan Zheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Queries with an explicit readPreference: "primary" are currently allowed to survive stepdown. This behavior is reasonable when the results are bounded. I.e: some results were returned and the remaining results from a getmore are just as consistent as if the node were still a primary. However for clients tailing a capped collection (e.g: the oplog), there is no longer a way to guarantee between the driver and server that once a query is opened against a primary, that the node continues to be primary. Applications that desire this guarantee must implement something on their end such as periodically re-issuing the query, or having some side-channel monitoring the replica set state. |
| Comments |
| Comment by Bernard Gorman [ 10/Aug/20 ] |
|
I agree with the original ticket description re: the distinction between a bounded regular query and a tailable cursor, and I can see a fair case that users would desire different behaviour for each. However, I think Arnie is correct that there are also plenty of occasions where a user would prefer a long-running regular query to stay off the new Primary if a node steps up during election, or conversely where they might want the query to migrate to the new Primary on stepdown. But we obviously can't revert to something like the old 4.0 (?) behaviour where we kill all queries on stepdown, as that would be far too disruptive to anything that isn't a change stream (though PM-915 may eliminate this difference to a large extent). Since this seems like a case where deciding on the most desirable behaviour is a toss-up and is as likely to annoy customers as to help them, why not give them the option to choose the appropriate behaviour? What if we were to introduce a new parameter in the readPreference spec, something like {strict: <boolean>} or, more explicitly, {reassessAfterElection: <boolean>}? This would default to 'false' to maintain the current behaviour, but if the client sets it to 'true' then we would revalidate the read preference each time we check out a cursor. That way, a find, aggregate or getMore which is running during an election would be allowed to complete, but the following getMore will throw InterruptedDueToReplStateChange if the node's new role no longer satisfies the read preference. Any operations which are resumable would then be re-targeted to the appropriate post-election nodes and re-issued. That way, customers could choose (on a per-operation basis) whether they want to prioritise cross-election query survival OR keeping workloads on/off particular nodes. The current behaviour would be maintained by default and would therefore not be a Versioned API violation, and the change would be relatively simple - we would not have to build new machinery to proactively seek out and kill operations every time there's an election. |
| Comment by Andy Schwerin [ 17/Mar/20 ] |
|
oplog tailing aside, I think the current behavior is correct. For the changestream and oplog tailing case, I'm less certain, because those queries are logically moving through time. I'm still not thrilled with hanging up on stepdown for queries that don't have a clear route to resumption (not restart), but maybe for collections where resumption is possible (oplog/changestream) this change or a similar one could make sense. |
| Comment by Daniel Gottlieb (Inactive) [ 13/Nov/19 ] |
|
Alternatively, it may be worthwhile to update the documentation.
|