Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Cannot Reproduce
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.0.0
Component/s: MapReduce
Labels:
- mapreduce
- options
Environment:
linux centos

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When using map reduce over a large collection (several millions of documents) and setting output to "replace" the replace is not really an atomic replacement, it seems to "reduce" on the output collection.

I use a map reduce operation to find the duplicates (based on one field) in a sharded environement.
The input collection has several millions documents, the output also (they should have the same number of elements because there should not be any duplicates in theory).

However if I relaunch the map reduce (using the replace output option from the mongodb shell), a lot of a false positive are found (~800 on 17 millions documents are counted twice).
If I drop the ouput collection before re-running the map reduce, no duplicates are found.

function mapDoublonsSqlId() {
emit(

{p : this.partnerId, id : this.sqlId}

, 1)
}

function reduceDoublonsSqlId(key,values) {
var total = 0;
values.forEach(function(o)

{ total+=o}

)
return total;
}

db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
db.tmp.count({value : {$gt : 1}}) //ok no duplicates

db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
db.tmp.count({value : {$gt : 1}}) //oho here is the issue, a lot of false duplicates are displayed

db.tmp.drop()
db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}})
db.tmp.count({value : {$gt : 1}}) //ok no duplicates any more

It seems that the replace does not work as expected.

Assignee:: Antoine Girbal (Inactive)
Reporter:: Grégoire Seux
Participants:: Antoine Girbal, Grégoire Seux
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 06 2012 09:36:24 AM UTC
Updated:: Aug 15 2012 02:02:56 PM UTC
Resolved:: Mar 16 2012 07:56:07 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates