[SERVER-4876] Map reduce with option "replace" is reducing instead Created: 06/Feb/12 Updated: 15/Aug/12 Resolved: 16/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Grégoire Seux | Assignee: | Antoine Girbal |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | mapreduce, options | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux centos |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
When using map reduce over a large collection (several millions of documents) and setting output to "replace" the replace is not really an atomic replacement, it seems to "reduce" on the output collection. I use a map reduce operation to find the duplicates (based on one field) in a sharded environement. However if I relaunch the map reduce (using the replace output option from the mongodb shell), a lot of a false positive are found (~800 on 17 millions documents are counted twice). function mapDoublonsSqlId() { , 1) function reduceDoublonsSqlId(key,values) { ) db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}}) db.runCommand({mapreduce : "products", map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : "tmp"}}) db.tmp.drop() It seems that the replace does not work as expected. |
| Comments |
| Comment by Grégoire Seux [ 16/Mar/12 ] |
|
no it does not happen anymore. You can close this ticket. |
| Comment by Antoine Girbal [ 15/Mar/12 ] |
|
are you still seeing this issue? |
| Comment by Antoine Girbal [ 06/Feb/12 ] |
|
I tried but cannot reproduce this issue, with v2.0.2 and 200k docs sharded collection.
|