[SERVER-22060] Sharded mapReduce output option {replace: "coll", sharded: true} can create an invalid sharded collection Created: 04/Jan/16  Updated: 06/Dec/22  Resolved: 05/Jan/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14324 MapReduce does not respect existing s... Closed
Assigned Teams:
Sharding
Operating System: ALL
Steps To Reproduce:

(function() {
    'use strict';
 
    var st = new ShardingTest({shards: 2});
    var sdb = st.s.getDB("test");
    var inputColl = sdb.input;
    var outputColl = sdb.output;
 
    assert.commandWorked(sdb.adminCommand({enableSharding: sdb.getName()}));
    sdb.adminCommand({movePrimary: sdb.getName(), to: "shard0000"});
 
    assert.writeOK(inputColl.insert({a: 1, b: 5}));
    assert.writeOK(inputColl.insert({a: 1, b: 6}));
    assert.writeOK(inputColl.insert({a: 2, b: 7}));
    assert.writeOK(inputColl.insert({a: 2, b: 8}));
 
    // Create a sharded collection with a chunk on each shard.
    assert.commandWorked(outputColl.ensureIndex({c: 1}));
    assert.writeOK(outputColl.insert({c: -1}));
    assert.writeOK(outputColl.insert({c: 1}));
    assert.commandWorked(sdb.adminCommand({
        shardCollection: outputColl.getFullName(),
        key: {c: 1}
    }));
    assert.commandWorked(sdb.adminCommand({split: outputColl.getFullName(), middle: {c: 0}}));
    assert.commandWorked(sdb.adminCommand({
        moveChunk: outputColl.getFullName(),
        find: {c: 1},
        to: "shard0001"
    }));
 
    // This creates a collection sharded by {c: 1}, whose documents do not contain the shard key.
    sdb.runCommand({
        mapReduce: inputColl.getName(),
        map: function() {
            emit(this.a, this.b);
        },
        reduce: function(key, values) {
            var sum = 0;
            for (var i = 0; i < values.length; i++) {
                sum += values[i];
            }
            return sum;
        },
        out: {replace: "output", sharded: true}
    });
 
    assert.eq(2, outputColl.find().itcount());
})();

Participants:

 Description   

Suppose you have a collection "coll" sharded by some non-_id shard key, {shardKey: 1}. Users are allowed to run a mapReduce operation with the output option {replace: "coll", sharded: true}. This means that "coll" should be replaced by the output of the mapReduce, and that the new collection should be sharded.

New sharded collections created by mapReduce are sharded by {_id: 1}, an assumption which is made in several places in the code. During the replace, however, the collection is never re-sharded by {_id: 1}. Instead, the sharding metadata continues to show that the shard key is {shardKey: 1}.

The documents in the mapReduce output collection are of the form

{_id: <key>, value: <value>}

These documents are missing the shard key! The collection at this point is broken and queries against it can return incorrect results. See the repro steps for details.



 Comments   
Comment by Scott Hernandez (Inactive) [ 05/Jan/16 ]

Dup of SERVER-14324.

Generated at Thu Feb 08 03:59:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.