[SERVER-26426] mapReduce key comparisons do not respect the collation Created: 30/Sep/16  Updated: 06/Dec/22  Resolved: 09/Mar/20

Status: Closed
Project: Core Server
Component/s: MapReduce, Querying
Affects Version/s: None
Fix Version/s: 4.4.0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-26422 Group command does not group with res... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

The keys produced by the mapReduce emit function are always compared using the simple collation rather than the mapReduce operation's collation. For example, consider the following script:

db.c.drop();
db.c.insert({_id: 1, str: "foo"});
db.c.insert({_id: 2, str: "FOO"});
db.c.insert({_id: 3, str: "bar"});
db.c.insert({_id: 4, str: "BAR"});
 
db.c.mapReduce(
    function() { emit(this.str, 1); },
    function(key, values) { return Array.sum(values); },
    {out: {inline: 1}, collation: {locale: "en", strength: 2}}
);

This script produces output such as the following:

{
	"results" : [
		{
			"_id" : "BAR",
			"value" : 1
		},
		{
			"_id" : "FOO",
			"value" : 1
		},
		{
			"_id" : "bar",
			"value" : 1
		},
		{
			"_id" : "foo",
			"value" : 1
		}
	],
	"timeMillis" : 133,
	"counts" : {
		"input" : 4,
		"emit" : 4,
		"reduce" : 0,
		"output" : 4
	},
	"ok" : 1
}

Since the operation is using the English case-insensitive collation, one might expect the results array to contain 2 elements rather than 4. This is because "FOO" == "foo" and "BAR" == "bar" with respect to the collation.



 Comments   
Comment by Charlie Swanson [ 09/Mar/20 ]

This is no longer a problem after completing a recent project where we created a new implementation of mapReduce backed by the aggregation framework.

Comment by Asya Kamsky [ 18/May/18 ]

Sharded mapReduce enforces sharding on _id only - the shard key cannot be anything other than simple collation, right?

 

Generated at Thu Feb 08 04:12:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.