[SERVER-82815] Expose server’s index key creation via aggregation Created: 06/Nov/23  Updated: 16/Jan/24  Resolved: 15/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.1, 7.3.0-rc0, 7.0.6, 5.0.25, 4.4.29, 6.0.14

Type: New Feature Priority: Major - P3
Reporter: Felipe Gasper Assignee: Rui Liu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Issue split
split to SERVER-84198 Facilitate multiple collations within... Closed
Related
related to SERVER-84462 Consider offering a way for $toHashed... Backlog
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.2, v7.1, v7.0, v6.0, v5.0, v4.4
Sprint: QE 2023-12-11, QE 2024-01-08, QE 2024-01-22
Participants:

 Description   

Overview

Currently aggregations can only set a single collation over the entire pipeline. This makes some sense for aggregations that originate from collections, but it’s more problematic for change streams that span multiple collections (as, e.g., mongosync uses). It's quite easy to have a data consistency problem if the client forgets/overlooks that string comparisons in such a change stream are simple-collated, regardless of the respective collections’ default collations.

REP-3312 was such a problem. This prompted a Critical Advisory for mongosync, which led (in part) to the present [Migration & Backup Correctness|INIT-532] initiative, which includes mongosync’s current [collation-fixes epic|REP-3672].

This task proposes to facilitate a fix for this by exposing the server’s internal index key via an aggregation operator, which I’ll tentatively call $_internalIndexKey. This operator would look thus:

{ $_internalIndexKey: {
    input: "abc", // … but can be any arbitrary BSON value
    collation: { locale: "en", strength: 1 },
} }

… and would output, as a binary blob, the index key that the server would create for that string & collation.

This will facilitate REP-3312’s fix.

Numeric Types

As a convenience, this also envisions that the $_internalIndexKey operator will normalize numeric types. Thus, mongosync will have an easy way to tell via aggregation that { $numberLong: 42 } and { $numberDouble: 42 } are, in fact, the same number. See comments and linked tickets for context on how this helps us.

Rejected Alternatives

See REP-3672’s (in-progress) technical design for a list of considered alternative solutions.

See SERVER-84198 for an additional request to facilitate full collation support with document filtering in mongosync.



 Comments   
Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: e099a44278552998e042815238d54bf4904de7fd
Branch: v4.4
https://github.com/mongodb/mongo/commit/1bab72cc42e41b01a8063e07a1eda35a37db3c85

Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: 9b66053c3e4e1e314cd0946b3de51551f29466f3
Branch: v5.0
https://github.com/mongodb/mongo/commit/89b4c3c8e6711bba45b4296d48b14b73216e6a56

Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: 9bfa65afb2a29322dafd1e203be5731ea584a23d
Branch: v6.0
https://github.com/mongodb/mongo/commit/243e2ddd78b49208ac0e32052af0fcb948e74be0

Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: cc7c6598bd320cd045835196b57805ffd784c962
Branch: v7.0
https://github.com/mongodb/mongo/commit/d88d10c9e3b395b0ab44692ada0ebe3086cdce93

Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: 87dbb5b0f66c17ed850b378fe437e00fcafab3f8
Branch: v7.0
https://github.com/mongodb/mongo/commit/ee1da868742ded6fcc4642b3f6936814a57c08c6

Comment by Githook User [ 15/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

(cherry picked from commit d3c0fa0caad7ba07c53669fc3247a6560fe8fc7f)
Branch: v7.2
https://github.com/mongodb/mongo/commit/27f9a614d7d8d6ba8cc7cff8ff4297f6dd49cc78

Comment by Githook User [ 12/Jan/24 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-82815 Implement expression $_internalKeyStringValue

GitOrigin-RevId: d3c0fa0caad7ba07c53669fc3247a6560fe8fc7f
Branch: master
https://github.com/mongodb/mongo/commit/531c4daba675e1a8a5213af7823ac01bccf0e8f7

Comment by Felipe Gasper [ 14/Dec/23 ]

What dave.rolsky@mongodb.com said: we need the “real” index key creation logic, now for multiple reasons.

Comment by Dave Rolsky [ 08/Dec/23 ]

I have a request closely related to this, which is that this new $_indexKey operator should also normalize numeric values, so that two docs with the same numeric _id, but with different numeric types, normalize to the same value.

To make this concrete, consider these docs:

I have a request closely related to this, which is that this new {{$_indexKey}} operator should also normalize numeric values, so that two docs with the same numeric {{_id}}, but with different numeric types, normalize to the same value.
 
To make this concrete, consider these {{_id}} values:

{"_id": {"$numberInt":"42"}}
{"_id": {"$numberLong":"42"}}
{"_id": {"$numberDouble":"42.0"}}
{"_id": {"$numberDecimal":"42.0"}}

 

They should all produce the same index key value.

I think this is already baked into the request in this ticket, but I just wanted to call this out.

Comment by Felipe Gasper [ 01/Dec/23 ]

Moving to NS per ivan.fefer@mongodb.com’s recommendation.

Generated at Thu Feb 08 06:50:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.