[SERVER-4364] Method for running large mapReduce operations in a working replica set Created: 23/Nov/11 Updated: 06/Dec/22 Resolved: 04/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 1.8.4, 2.0.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | T. Dampier | Assignee: | Backlog - Query Optimization |
| Resolution: | Done | Votes: | 7 |
| Labels: | mapreduce,, replication | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
replica set – e.g., a PRIMARY, a SECONDARY, and an ARBITER – receiving a steady trickle of read & write operations |
||
| Issue Links: |
|
||||
| Assigned Teams: |
Query Optimization
|
||||
| Participants: | |||||
| Description |
|
As a user of a MongoDB database served from an N-node replica set (running in --auth mode) that is responding to a steady trickle of read and write requests, I want to be able to complete a large map/reduce task without material disruption to the operation of the replica set. By "large," I mean that I cannot guarantee the result set fits in a single 16MB document – i.e., my task is a poor candidate for "out: { inline: 1 }". By "material disruption," I mean a variety of pathologies I'd like to avoid if possible. For example:
For the sake of argument, let's further assume my map/reduce task is not amenable to an incremental application of successive mapReduce calls; and that I would potentially like to preserve the map/reduce output collection as a target for future queries, though it would be acceptable to have to manually move that result data around among hosts. I've spoken with a couple members of the 10gen engineering team, and they've been very helpful in brainstorming approaches and workarounds. But for the sake of the product, I want to make sure the use case is captured here, as I don't think any of the approaches I'm currently aware of really addresses it satisfactorily. Working with current constraints, here's where we get ... The "large" / no inline requirement today will force us to execute the mapReduce() call on the PRIMARY. The reason being: an output collection will have to be created, and only the PRIMARY is permitted to do these writes. But today, for a certain class of job, executing on the PRIMARY is a non-starter. It will consume too many system resources, monopolize the one and only javascript thread, hold the write lock for extended periods of time, etc. – risking running afoul of pathology #1 above. If that failure mode is dodged, it runs the risk of pathology #2, as the large result set (created much faster than any more "natural" insertion of data) floods the SECONDARY nodes and overflows their oplogs. And of course failure mode #3 is always a concern – the PRIMARY is too critical to the overall responsiveness of any app that uses the database. That's the problem, in a nutshell. Some potential, suggested solutions – all of which involve changes to core server : A. If we could relax or finesse the only-PRIMARY-can-write constraint for this kind of operation, we could run the mapReduce() call on a SECONDARY – perhaps even a "hidden" node – and leave the overall cluster and any apps almost completely unaffected. Then we'd take responsibility for getting the result collection out of that node as a separate task. Currently, though, this does not seem possible – cf. issue B. Alternately, if we could alter the mapReduce() command in such a way as to allow "out:" to target a remote database – perhaps in an entirely disjoint replica set – we could allow a secondary to run the calculation, save the result set, but still not violate the proscription against writing on a secondary. C. Another possibility is to flag the result set for exclusion from replication – though this leaves us running on the PRIMARY, where failure modes #1 and #3 are still an issue. My intent here, though, is not to describe or attach to any particular solution; rather, I'm just seeking to articulate the problem standing in the way of (what I consider) a rather important use case. Thanks for reading this far! TD |
| Comments |
| Comment by Esha Bhargava [ 04/Feb/22 ] |
|
Closing these tickets as part of the deprecation of mapReduce. |
| Comment by Eliot Horowitz (Inactive) [ 28/Nov/11 ] |
|
Not sure we'll have a great solution in 2.2 for this - but we know its an issue and are working on some ideas. |