[SERVER-82815] Expose server’s index key creation via aggregation Created: 06/Nov/23 Updated: 16/Jan/24 Resolved: 15/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.1, 7.3.0-rc0, 7.0.6, 5.0.25, 4.4.29, 6.0.14 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Felipe Gasper | Assignee: | Rui Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v7.2, v7.1, v7.0, v6.0, v5.0, v4.4
|
||||||||||||||||||||||||
| Sprint: | QE 2023-12-11, QE 2024-01-08, QE 2024-01-22 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
OverviewCurrently aggregations can only set a single collation over the entire pipeline. This makes some sense for aggregations that originate from collections, but it’s more problematic for change streams that span multiple collections (as, e.g., mongosync uses). It's quite easy to have a data consistency problem if the client forgets/overlooks that string comparisons in such a change stream are simple-collated, regardless of the respective collections’ default collations. REP-3312 was such a problem. This prompted a Critical Advisory for mongosync, which led (in part) to the present [Migration & Backup Correctness|INIT-532] initiative, which includes mongosync’s current [collation-fixes epic|REP-3672]. This task proposes to facilitate a fix for this by exposing the server’s internal index key via an aggregation operator, which I’ll tentatively call $_internalIndexKey. This operator would look thus:
… and would output, as a binary blob, the index key that the server would create for that string & collation. This will facilitate REP-3312’s fix. Numeric TypesAs a convenience, this also envisions that the $_internalIndexKey operator will normalize numeric types. Thus, mongosync will have an easy way to tell via aggregation that { $numberLong: 42 } and { $numberDouble: 42 } are, in fact, the same number. See comments and linked tickets for context on how this helps us. Rejected AlternativesSee REP-3672’s (in-progress) technical design for a list of considered alternative solutions. See |
| Comments |
| Comment by Githook User [ 16/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: e099a44278552998e042815238d54bf4904de7fd | |||||
| Comment by Githook User [ 16/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: 9b66053c3e4e1e314cd0946b3de51551f29466f3 | |||||
| Comment by Githook User [ 16/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: 9bfa65afb2a29322dafd1e203be5731ea584a23d | |||||
| Comment by Githook User [ 16/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: cc7c6598bd320cd045835196b57805ffd784c962 | |||||
| Comment by Githook User [ 16/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: 87dbb5b0f66c17ed850b378fe437e00fcafab3f8 | |||||
| Comment by Githook User [ 15/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: (cherry picked from commit d3c0fa0caad7ba07c53669fc3247a6560fe8fc7f) | |||||
| Comment by Githook User [ 12/Jan/24 ] | |||||
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: GitOrigin-RevId: d3c0fa0caad7ba07c53669fc3247a6560fe8fc7f | |||||
| Comment by Felipe Gasper [ 14/Dec/23 ] | |||||
|
What dave.rolsky@mongodb.com said: we need the “real” index key creation logic, now for multiple reasons. | |||||
| Comment by Dave Rolsky [ 08/Dec/23 ] | |||||
|
I have a request closely related to this, which is that this new $_indexKey operator should also normalize numeric values, so that two docs with the same numeric _id, but with different numeric types, normalize to the same value. To make this concrete, consider these docs:
{"_id": {"$numberInt":"42"}}
They should all produce the same index key value. I think this is already baked into the request in this ticket, but I just wanted to call this out. | |||||
| Comment by Felipe Gasper [ 01/Dec/23 ] | |||||
|
Moving to NS per ivan.fefer@mongodb.com’s recommendation. |