[SERVER-76855] Audit mongos for improper usage of collation and incorrectly assuming simple collation rather than collection default Created: 04/May/23  Updated: 22/Jan/24  Resolved: 10/Oct/23

Status: Closed
Project: Core Server
Component/s: Distributed Query Execution
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Mihai Andrei
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-80145 Avoid explicit callers of ChunkManage... Closed
Related
related to SERVER-85572 Follow up on audit in mongos for impr... Open
related to SERVER-81991 Delete RoutingCollator after branchin... Open
is related to SERVER-71896 Validate if a query with _id or shard... Closed
is related to SERVER-76857 Have useTwoPhaseProtocol use the coll... Closed
is related to SERVER-24433 Distinguish between the simple collat... Backlog
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QE 2023-06-26, QE 2023-07-10, QE 2023-07-24, QE 2023-08-07, QE 2023-08-21, QE 2023-09-04, QE 2023-09-18, QE 2023-10-02, QE 2023-10-16
Participants:

 Description   

While reviewing new code from PM-1632, I spot-checked some existing patterns for how the collation is being resolved by mongos. It appears the are multiple cases where the std::unique_ptr<CollatorInterface> is left nullptr when the collation specification is an empty BSONObj() despite the rule being "when the collation is unspecified in the request the collection's default collation is used to satisfy the operation." Without further investigation it is unclear the extent to which mongos would be doing post-processing of results after merging cursor results (e.g. $group followed by $match) where the collator used by mongos is relevant for the correctness of query results.

Moreover, the contract desired by ExpressionContext is impossible for mongos to uphold given that mongos has no knowledge about the default collation for unsharded collections. The sharding catalog only stores the collation specification for sharded collections.

* The ExpressionContext is always set up with the fully-resolved collation. So even though
* SERVER-24433 describes an ambiguity between a null collator, here we can say confidently that
* null must mean simple since we have already handled "absence of a collator" before creating
* the ExpressionContext.

Hopefully we can produce an interface which leverages the C++ type system to enforce the following contract:

  1. mongos incorrectly fills in the collator for unsharded collections as simple collation but promises to not ever use it.
  2. mongos correctly fills in the collator for sharded collections by consulting the ChunkManager and uses it when responsible for merging + post-processing.


 Comments   
Comment by Githook User [ 10/Oct/23 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com', 'username': 'mtandrei'}

Message: SERVER-76855 Ignore collator on mongos when targeting an unsharded collection and user did not specify collation
Branch: master
https://github.com/mongodb/mongo/commit/cff66caa0096f7a1f20ca7cd2d5927a086f51bb0

Comment by Mihai Andrei [ 22/Aug/23 ]

Going to take this one as this is needed in order to do https://jira.mongodb.org/browse/SERVER-80145 

Comment by Max Hirschhorn [ 04/May/23 ]

Below are my findings from the simple audit I did based on the collation.isEmpty() pattern used in the codebase.


$ git grep -E 'collation.*isEmpty' -- src/mongo/s/
src/mongo/s/chunk_manager.cpp:353:    const bool hasSimpleCollation = (collation.isEmpty() && !_rt->optRt->getDefaultCollator()) ||
src/mongo/s/cluster_commands_helpers.cpp:96:        if (collation.isEmpty()) {
src/mongo/s/cluster_commands_helpers.cpp:185:    if (!collation.isEmpty()) {
src/mongo/s/collection_routing_info_targeter.cpp:322:    if (!collation.isEmpty()) {
src/mongo/s/collection_routing_info_targeter.cpp:600:    if (!collation.isEmpty()) {
src/mongo/s/commands/cluster_distinct_cmd.cpp:214:        if (!collation.isEmpty()) {
src/mongo/s/commands/cluster_distinct_cmd.cpp:281:                                  !collation.isEmpty()
src/mongo/s/commands/cluster_map_reduce_agg.cpp:71:    if (!collationObj.isEmpty()) {
src/mongo/s/commands/cluster_query_without_shard_key_cmd.cpp:88:    if (!collation.isEmpty()) {
src/mongo/s/commands/sharding_expressions.cpp:82:        if (auto collation = indexDescriptor->collation(); !collation.isEmpty()) {
src/mongo/s/query/cluster_aggregate.cpp:123:    if (!collationObj.isEmpty()) {
src/mongo/s/query/cluster_aggregation_planner.cpp:814:    return {collation.isEmpty() ? getCollation() : collation, getUUID()};
src/mongo/s/shard_key_pattern_query_util.cpp:418:    if (!collation.isEmpty()) {
src/mongo/s/write_ops/write_without_shard_key_util.cpp:71:    if (!collation.isEmpty()) {
src/mongo/s/write_ops/write_without_shard_key_util.cpp:90:        if (CollationIndexKey::isCollatableType(idElt.type()) && !collation.isEmpty() &&
src/mongo/s/write_ops/write_without_shard_key_util.cpp:201:    auto collator = collation.isEmpty()

Generated at Thu Feb 08 06:33:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.