[SERVER-59659] Investigate allowing the change stream oplog query to run using the pipeline collation Created: 28/Aug/21  Updated: 06/Mar/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Justin Seyster Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-59660 Allow change stream rewrites for $mat... Closed
Related
is related to SERVER-59660 Allow change stream rewrites for $mat... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

Change streams have always prevented a user-defined collation from being applied to the filter pushed down into the oplog, primarily so that a case-insensitive collation cannot cause a single-collection stream to return events from multiple similarly-named collections (see SERVER-31443).

This became an issue during PM-1942, since we are now rewriting user filters into the oplog, and the user may have specified a collation for their $match filters. We therefore chose, in SERVER-59426 and SERVER-59840, to abandon any attempt to rewrite the user's query if they have requested a non-simple collation. This behavior ensures that a user collation does not get applied to the oplog query, which could have unintended consequences.

However, in practice the changes inĀ SERVER-56872 should actually make the oplog query collation-agnostic. All predicates on namespace fields now use regexes, which ignore the collation. The remaining predicates all operate on fields that cannot have a string value or fields whose values belong to a limited set of strings that cannot be confused for each other by any collation (e.g., the "op" field).

It may therefore now be safe to allow the oplog query to have an arbitrary collation.



 Comments   
Comment by Bernard Gorman [ 03/Oct/21 ]

Note that one consideration here is that we would have to prevent any of our oplog rewrites from converting a user-defined string equality filter into a regex, since that would cause the predicate to incorrectly ignore the collation. For instance, we currently rewrite some string-equality predicates on field 'ns' into a regex here.

While this isn't critical - very few people are likely to run change streams with a non-simple collation - it would be nice to get rid of this restriction and the temporarilyChangeCollator "workaround" along with it, since this mechanism only exists because of change streams and is not used anywhere else in the server. As a (less desirable) alternative, if it is not feasible to simply allow the user collation to be applied to the oplog query, we could selectively permit predicates on collation-agnostic fields as outlined in SERVER-59660.

I'd suggest placing this ticket in the QO Quick Wins bucket.

Generated at Thu Feb 08 05:47:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.