Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84198

Facilitate multiple collations within the same change stream.

    • Type: Icon: New Feature New Feature
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Query Execution

      Mongosync applies document queries in two contexts:
      1) partitioning during initial sync
      2) cluster-wide change streams

      The initial-sync queries are per-collection and so use each collection's default collation. The change stream, though, is multi-collection, so it's simple-collated. Thus, if we search on "_id > aaa && _id < zzz" we'll match _id=BBB during initial sync but not in the change stream.

      SERVER-82815 will provide a solution for this by allowing aggregation to convert _id, aaa, zzz, and BBB to whatever byte sequence the server uses to represent them in indexes.

      This problem worsens in the context of [document filtering|REP-1954], where the query will come from the customer. Here we either have to limit the scope of support for strings in queries pretty dramatically or implement some sort of query-transform logic based on SERVER-82815's new operator ... but even that would likely only support certain limited use cases.

      We can soften the problem somewhat by having customers migrate like-collated collections in concurrent mongosync sessions. Given limitations on the # of concurrent change streams, though, this won't scale well to multi-tenant setups where dozens, even hundreds, of collations may coexist on a given source cluster.

      It seems that, ultimately, we can't "gracefully" support collations without some ability to apply multiple collations in a given change stream.

            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            felipe.gasper@mongodb.com Felipe Gasper
            0 Vote for this issue
            5 Start watching this issue