[SERVER-16045] null value in mapReduce results Created: 10/Nov/14  Updated: 10/Jan/15  Resolved: 10/Jan/15

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitry Poklonskiy Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongodump2.json    
Operating System: OS X
Participants:

 Description   

db version v2.6.4

I have ~100 entries (dump in attach) in my collection and try to do mapReduce on it with the following command

db.features.mapReduce(
  function () {
    emit(this._quadKey.substr(0, 1), this );
  },
  function (key, values) {
    var result = { q: [], ids: [] };
    values.forEach(       
      function (it) {
        result.q.push(it._quadKey);
        result.ids.push(it._id);
      }
    );
    return result; 
  },
  {
    query: { _quadKey: { $regex: '^(0|1|2|3)' } },
    out: { inline: 1 } 
  }
)

"_quadKey" field which is used for grouping in Map function is a string of base-4 numbers that looks like "33100321300223132221023"

As a result of this command i got 5 reduce calls instead of 4 (see stat)
and a "null" value in last group with "3" key and i got only 2 (except null) entries there (should be 20+).

{
        "results" : [
                {
                        "_id" : "0",
                        "value" : {
                                "q" : [
                                        "00211222022320022310003",
...
                                        "03332223130001223231310"
                                ],
                                "ids" : [
                                        ObjectId("545e7065eb6be425a700003a"),
...
                                        ObjectId("545e7052eb6be425a700002c")
                                ]
                        }
                },
                {
                        "_id" : "1",
                        "value" : {
                                "q" : [
                                        "10033101320022232203331",
...
                                        "13331332032210022212221"
                                ],
                                "ids" : [
                                        ObjectId("545e709beb6be425a700005f"),
...
                                        ObjectId("545e7043eb6be425a700001e")
                                ]
                        }
                },
                        "_id" : "2",
                        "value" : {
                                "q" : [
                                        "20003331310022201013110",
...
                                        "23302313120113202111313"
                                ],
                                "ids" : [
                                        ObjectId("545e7050eb6be425a700002a"),
...
                                        ObjectId("545e7063eb6be425a7000038")
                                ]
                        }
                },
                {
                        "_id" : "3",
                        "value" : {
                                "q" : [
                                        null,
                                        "33100321300223132221023",
                                        "33101232232201203113211"
                                ],
                                "ids" : [
                                        null,  // Here it is!!!     
                                        ObjectId("545e7059eb6be425a7000031"),
                                        ObjectId("545e7081eb6be425a7000052")
                                ]
                        }
                }
        ],
        "timeMillis" : 12,
        "counts" : {
                "input" : 102,
                "emit" : 102,
                "reduce" : 5,
                "output" : 4
        },
        "ok" : 1

Also I have noticed that group where I've got "null" depends on index
if I crete index for "_quadKey" field I'he got null in last key "3" group

db.features.ensureIndex({ _quadKey: 1 })

If i drop this index I've got this issue in first "0" group

db.features.dropIndex({ _quadKey: 1 })

{
        "results" : [
                {
                        "_id" : "0",
                        "value" : {
                                "q" : [
                                        null,
                                        "00222330102200231213001",
                                        "00211222022320022310003"
                                ],
                                "ids" : [
                                        null,
                                        ObjectId("545e701eeb6be425a7000004"),
                                        ObjectId("545e7065eb6be425a700003a")
                                ]
                        }
                },
...
        ],
        "timeMillis" : 6,
        "counts" : {
                "input" : 102,
                "emit" : 102,
                "reduce" : 5,
                "output" : 4
        },
        "ok" : 1
}



 Comments   
Comment by Ramon Fernandez Marina [ 10/Jan/15 ]

dimik, after some investigation it seems that your reduce function does not meet the necessary requirements. Note specifically the following paragraph:

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

The null value appears because the reduce function is called a second time (and you're seeing 5 reduce calls instead of the 4 you expect), and you add it._id to the result – but that it value is not a document from your collection but the previous result of calling the reduce function with the same key so it._id is null.

This also explains the fact that you don't see all the _quadKey entries you're expecting, as in this second call to reduce you dismiss the results from previous calls.

If I modify your reduce function to detect this case then everything works as expected:

var myreduce = function (key, values) {
    var tmp = { q: [], ids: [] };
    var result = { q: [], ids: [] };
 
    values.forEach(
      function (it) {
        if (it._id == null) {
            // Results from the previous call
            result.q = result.q.concat(it.q);
            result.ids = result.ids.concat(it.ids);
            return;
        }
        tmp.q.push(it._quadKey);
        tmp.ids.push(it._id); 
      }
    ); 
 
    result.q = result.q.concat(tmp.q);
    result.ids = result.ids.concat(tmp.ids);
    return result;
};

Please check the rest of the requirements for the reduce function and modify your code accordingly.

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 10/Jan/15 ]

Hi dimik, apologies for the late reply. Thanks for the data and the reproducer, we can observe the same behavior you describe and we're investigating.

Comment by Dmitry Poklonskiy [ 10/Nov/14 ]

Forgot to say, I called mapReduce command from mongo-shell

Generated at Thu Feb 08 03:39:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.