Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4382

MR becomes very slow if it keeps reducing a very large object

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.1.0
    • Affects Version/s: 2.0.1
    • Component/s: MapReduce
    • Labels:

      If many values are emitted for same key, and object keeps growing, MR does not flush it to disk until it reaches a threshold size.
      But a large object becomes very slow to handle for JS and it may use much more memory than we think it's using.
      It triggers many reduce steps and potential GC.
      Example is:

      map = function() {
        emit(this.full_name, this._id);
      }
      
      reduce = function(k,vals) {
           var tmp = {};
           vals.forEach(function(i) {
              if(typeof(i) == 'string') {
                tmp[i] = true;
              } else {
                for(var z in i) tmp[z] = true;
              }
           });
           return tmp;
      }
      

      Against a collection with 1m docs like:

      {
              "_id" : {__rand: "str", len: 20},
              "soc_id" : {__rand: "str", len: 10},
              "exp" : {__rand: "int", min: 0, max: 100000000},
              "full_name" : "Natalya",
              "last_entrance" : 1321935873,
              "score" : 5000
      }
      

            Assignee:
            antoine Antoine Girbal
            Reporter:
            antoine Antoine Girbal
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: