[SERVER-23278] Sharded mapReduce returns wrong result Created: 21/Mar/16  Updated: 06/Dec/22  Resolved: 24/Mar/16

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: 3.2.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Query Team (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query
Operating System: ALL
Steps To Reproduce:

 
var st = new ShardingTest({
        shards: {rs0: {nodes: 1}, rs1: {nodes: 1}},
        mongos: 1,
    });
 
var dbName = "wc-test";
var mongos = st.s;
var db = mongos.getDB(dbName);
var collName = 'leaves';
var coll = db[collName];
function createShards() {
    var numberDoc = 20;
    db.adminCommand({movePrimary: db.toString(), to: st._shardNames[0]});
    db.adminCommand({enablesharding: db.toString()});
    coll.ensureIndex({x: 1}, {unique: true});
    db.adminCommand({shardCollection: coll.toString(), key: {x: 1}});
    st.configRS.awaitLastOpCommitted();
    db.adminCommand({split: coll.getFullName(), middle: {x: numberDoc / 2}});
 
    db.adminCommand({
        moveChunk: coll.getFullName(),
        find: {x: numberDoc / 2},
        to: st._shardNames[1],
    });
 
    for (var i = 0; i < numberDoc; i++) {
        coll.insert({x: i});
    }
    assert.eq(coll.count(), numberDoc);
}
 
createShards();
coll.insert({x: -3, tags: ["a", "b"]});
coll.insert({x: -7, tags: ["b", "c"]});
coll.insert({x: 23, tags: ["c", "a"]});
coll.insert({x: 27, tags: ["b", "c"]});
 
printjson(db.runCommand({
            mapReduce: collName,
            map: function() {
                if(!this.tags) {
                    return;
                }
                this.tags.forEach(function(z) {
                    emit(z, 1);
                });
            },
            reduce: function(key, values) {
                return {
                    count: values.length
                };
            },
            out: {inline: 1}
        }));

Participants:

 Description   

I ran the following mapReduce query and the output was incorrect. I received the results shown below. If you remove the "createShards();" call in the repro, you'll see that you get the expected results. Similarly, if you change the x values in the tagged documents to all be on the same shard you also get the expected results.

Received:

	"results" : [
		{
			"_id" : "a",
			"value" : {
				"count" : 2
			}
		},
		{
			"_id" : "b",
			"value" : {
				"count" : 2
			}
		},
		{
			"_id" : "c",
			"value" : {
				"count" : 2
			}
		}
	],

Expected:

	"results" : [
		{
			"_id" : "a",
			"value" : {
				"count" : 2
			}
		},
		{
			"_id" : "b",
			"value" : {
				"count" : 3
			}
		},
		{
			"_id" : "c",
			"value" : {
				"count" : 3
			}
		}
	],



 Comments   
Comment by Asya Kamsky [ 19/Oct/16 ]

Marmor,

This bug is closed because original reproducer had an error in the reduce function.

If you think you've encountered a bug, please open a new ticket rather than adding to a closed ticket.

Comment by Mor [X] [ 19/Oct/16 ]

I think I've observed this issue on our setup, using 4 shards, each having 3 servers: primary, secondary, backup.

Since I can't use mapReduce's limit function (SERVER-2099) I've chosen two _id values to use for the mapReduce query.

Counting the number of documents between those documents:

db.getCollection('my_collection').count({ _id : { $gte : ObjectId("57d6c1ce691fa014e0615aa1"), $lte : ObjectId("57d6d2872e9af409e0583137") }} )

gives: 10309580.

However, when I run mapReduce (simplified) in node.js:

var query = { _id : { $gte : db.ObjectId(first_id), $lte: db.ObjectId(last_id) } };
  entities.mapReduce(
    function () {
      emit("count", 1);
    },
    function (key, values) {
      return Array.sum(values)
    },
    {
      query : query,
      out: {replace:"results"}
    },
    function (error) {
      ...
    }
  );

I'm getting:

{
    "_id" : "count",
    "value" : 9978352
}

Generated at Thu Feb 08 04:02:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.