[SERVER-85572] Follow up on audit in mongos for improper usage of collation and incorrectly assuming simple collation rather than collection default Created: 22/Jan/24  Updated: 23/Jan/24

Status: Open
Project: Core Server
Component/s: Distributed Query Execution
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-76855 Audit mongos for improper usage of co... Closed
is related to SERVER-71896 Validate if a query with _id or shard... Closed
is related to SERVER-76857 Have useTwoPhaseProtocol use the coll... Closed
is related to SERVER-24433 Distinguish between the simple collat... Backlog
Assigned Teams:
Query Execution
Operating System: ALL
Participants:

 Description   

The changes from SERVER-76855 ensure that the aggregate command will correctly use the collator in mongos when targeting for an untracked collection. There are places outside of the aggregate command in mongos which attempt to utilize the collation. Further investigation is needed to determine the extent to which mongos would be doing post-processing of results after merging cursor results (e.g. $group followed by $match) where the collator used by mongos is relevant for the correctness of query results.

Here is a reference to my simple audit from SERVER-76855 along with a more recent output from searching the codebase:

$ git grep -E 'collation.*isEmpty' -- src/mongo/s/
src/mongo/s/chunk_manager.cpp:674:    const bool hasSimpleCollation = (collation.isEmpty() && !_rt->optRt->getDefaultCollator()) ||
src/mongo/s/cluster_commands_helpers.cpp:119:    const auto noCollationSpecified = collation.isEmpty();
src/mongo/s/cluster_commands_helpers.cpp:226:        if (!collation.isEmpty()) {
src/mongo/s/collection_routing_info_targeter.cpp:406:    if (!collation.isEmpty()) {
src/mongo/s/collection_routing_info_targeter.cpp:777:    if (!collation.isEmpty()) {
src/mongo/s/commands/cluster_distinct_cmd.cpp:373:                                  !collation.isEmpty()
src/mongo/s/commands/cluster_map_reduce_agg.cpp:118:    if (!collationObj.isEmpty()) {
src/mongo/s/commands/cluster_map_reduce_agg.cpp:161:    if (!cm.hasRoutingTable() && collationObj.isEmpty()) {
src/mongo/s/commands/cluster_query_without_shard_key_cmd.cpp:150:    if (!parsedInfo.collation.isEmpty()) {
src/mongo/s/commands/cluster_query_without_shard_key_cmd.cpp:427:            if (parsedInfoFromRequest.collation.isEmpty()) {
src/mongo/s/commands/sharding_expressions.cpp:119:        if (auto collation = indexDescriptor->collation(); !collation.isEmpty()) {
src/mongo/s/query/cluster_aggregate.cpp:140:    if (!collationObj.isEmpty()) {
src/mongo/s/query/cluster_aggregate.cpp:159:    if ((!cri || !cri->cm.hasRoutingTable()) && collationObj.isEmpty()) {
src/mongo/s/query/cluster_aggregation_planner.cpp:633:                    !collationToReturn.isEmpty());
src/mongo/s/query/cluster_aggregation_planner.cpp:816:    if (nss.isCollectionlessAggregateNS() || !collation.isEmpty() || !cm) {
src/mongo/s/shard_key_pattern_query_util.cpp:468:    if (!collation.isEmpty()) {
src/mongo/s/shard_key_pattern_query_util.cpp:476:    if (!cm.hasRoutingTable() && collation.isEmpty()) {
src/mongo/s/write_ops/write_without_shard_key_util.cpp:267:            if (collation.isEmpty()) {



 Comments   
Comment by Max Hirschhorn [ 22/Jan/24 ]

As an example, mihai.andrei@mongodb.com and I spot-checked this block in the distinct command which appears to be using the simple collation to process the values array returned by the distinct command. This behavior in mongos is initially suspicious because an untracked collection may have a non-simple default collation for how the contents of the values array ought to be compared. However the loop over shardResponses is guaranteed to execute once for an unsharded collection and therefore BSONObjSet all will still correctly contain the distinct values of the one shard's response (where the shard already applied the collection's default collation to the values array itself).

Generated at Thu Feb 08 06:58:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.