[SERVER-4264] add option to map/reduce output to output to the primary of a replica set or a different server Created: 11/Nov/11 Updated: 06/Dec/22 Resolved: 04/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Daniel Pasette (Inactive) | Assignee: | Backlog - Query Execution |
| Resolution: | Done | Votes: | 15 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
In the mongo shell, after connecting directly to a secondary, I can "use local", and then save documents to that database. I know they're not replicated, and that I need to copy the result somewhere else if I want to preserve it. However, if I try to use the "out" option of "mapreduce" to write the output of a map-reduce job to "local", I get an error that I can't do that. It looks like the write is checked too high up in the call stack. It looks like that comes from here: https://github.com/mongodb/mongo/blob/master/db/commands/mr.cpp#L984-991 . |
| Comments |
| Comment by Esha Bhargava [ 04/Feb/22 ] |
|
Closing these tickets as part of the deprecation of mapReduce. |
| Comment by Mark Hansen [ 01/Oct/12 ] |
|
The KEY use-case for us needing this feature is being able to do Reporting/Analytics processing on one of the SECONDARIES. |
| Comment by Azat Khuzhin [ 17/May/12 ] |
|
@T. Dampier, I agree, and "output" to remote collection I think a good decision Correct me if I am wrong: if PRIMARY of replicaset changed - than MR return error "db assertion 13312 – when it cannot commit its result collection because former PRIMARY is no longer PRIMARY" ? |
| Comment by T. Dampier [ 17/May/12 ] |
|
@azat - for some background on this thread, see Basically, it's tough to run a large and/or intense M/R job on the PRIMARY of a replica set that has any other duties to perform. In these situations, it would be nice to be able to (a) do the work on a SECONDARY, and then (b) have a good place to write the output, such that neither (a) nor (b) bogs down the PRIMARY or the replica set as a whole. That's the high-level motivation I think. |
| Comment by Azat Khuzhin [ 17/May/12 ] |
|
And also affects version is not only 2.0.1 but 2.1 too In what version you planning to implement this feature? |
| Comment by Azat Khuzhin [ 17/May/12 ] |
|
@eliot, but what if master changed in some step of MR job now, is it break something? |
| Comment by Eliot Horowitz (Inactive) [ 17/May/12 ] |
|
We should add an option to output to a different server, and/or the primary of a set. |
| Comment by Azat Khuzhin [ 16/May/12 ] |
|
Why it is not possible to implement write output to master directly? Because what if we have output from 120 million of rows And, also what would be if master changed on some step of MR job? |
| Comment by Chris Westin [ 12/Jan/12 ] |
|
Apparently, this worked in the past, but has become broken. Note the training course materials also say this works, in the Replica Sets section. |