[SERVER-84198] Facilitate multiple collations within the same change stream. Created: 14/Dec/23  Updated: 26/Dec/23  Resolved: 26/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Felipe Gasper Assignee: Backlog - Query Execution
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-25954 Support more granular collation speci... Backlog
Issue split
split from SERVER-82815 Expose server’s index key creation vi... Closed
Assigned Teams:
Query Execution
Participants:

 Description   

Mongosync applies document queries in two contexts:
1) partitioning during initial sync
2) cluster-wide change streams

The initial-sync queries are per-collection and so use each collection's default collation. The change stream, though, is multi-collection, so it's simple-collated. Thus, if we search on "_id > aaa && _id < zzz" we'll match _id=BBB during initial sync but not in the change stream.

SERVER-82815 will provide a solution for this by allowing aggregation to convert _id, aaa, zzz, and BBB to whatever byte sequence the server uses to represent them in indexes.

This problem worsens in the context of [document filtering|REP-1954], where the query will come from the customer. Here we either have to limit the scope of support for strings in queries pretty dramatically or implement some sort of query-transform logic based on SERVER-82815's new operator ... but even that would likely only support certain limited use cases.

We can soften the problem somewhat by having customers migrate like-collated collections in concurrent mongosync sessions. Given limitations on the # of concurrent change streams, though, this won't scale well to multi-tenant setups where dozens, even hundreds, of collations may coexist on a given source cluster.

It seems that, ultimately, we can't "gracefully" support collations without some ability to apply multiple collations in a given change stream.


Generated at Thu Feb 08 06:54:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.