Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4876

Map reduce with option "replace" is reducing instead

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.0
    • Component/s: MapReduce
    • Labels:
    • Environment:
      linux centos
    • ALL

      When using map reduce over a large collection (several millions of documents) and setting output to "replace" the replace is not really an atomic replacement, it seems to "reduce" on the output collection.

      I use a map reduce operation to find the duplicates (based on one field) in a sharded environement.
      The input collection has several millions documents, the output also (they should have the same number of elements because there should not be any duplicates in theory).

      However if I relaunch the map reduce (using the replace output option from the mongodb shell), a lot of a false positive are found (~800 on 17 millions documents are counted twice).
      If I drop the ouput collection before re-running the map reduce, no duplicates are found.

      function mapDoublonsSqlId() {
      emit(

      {p : this.partnerId, id : this.sqlId}

      , 1)
      }

      function reduceDoublonsSqlId(key,values) {
      var total = 0;
      values.forEach(function(o)

      { total+=o}

      )
      return total;
      }

      db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
      db.tmp.count({value : {$gt : 1}}) //ok no duplicates

      db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
      db.tmp.count({value : {$gt : 1}}) //oho here is the issue, a lot of false duplicates are displayed

      db.tmp.drop()
      db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
      db.tmp.count({value : {$gt : 1}}) //ok no duplicates any more

      It seems that the replace does not work as expected.

            Assignee:
            antoine Antoine Girbal
            Reporter:
            kamaradclimber Grégoire Seux
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: