[SERVER-8853] MapReduce jobs execute exclusively on primaries Created: 05/Mar/13  Updated: 06/Dec/22  Resolved: 04/Feb/22

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Narrell Assignee: Backlog - Query Optimization
Resolution: Done Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS release 6.3 (Final) 2.6.32-279.9.1.el6.x86_64
Java driver


Attachments: Text File MapReduceTest.java    
Issue Links:
Depends
Related
related to SERVER-5504 Allow map-reduce jobs to run on repli... Closed
related to SERVER-41455 Support running $out or $merge from a... Closed
Assigned Teams:
Query Optimization
Operating System: Linux
Steps To Reproduce:

All the mongod instances are @ 2.4.0-rc1
All the mongos instances are @ 2.4.0-rc1 (or at least all the ones we're using to test/demonstrate the MapReduce issue)
All the config server instances are @ 2.4.0-rc1
The MongoDB Java driver is @ 2.10.1
We set the ReadPreference to secondary each and every place possible
We execute the MapReduce job
Connect to a primary and a secondary on one of our shards (scatter/gather).
Via currentOp() we can observe that the job only runs on the primary.

Participants:

 Description   

We've upgraded our cluster to 2.4.0-rc1 and we're still not seeing the MapReduce jobs execute on the secondaries. We were hoping SERVER-7423 would be included.

Attached is a sample Java application which can be used to illustrate the issue.

Current state of affairs in 2.5.0:
There are a few issues preventing M/R from running on secondaries:

  1. We don't pass query options to map reduce - so the actual command can't even tell if slaveOk bit is on.
  2. We currently write the temporary out put to tmp database and this is not allowed on secondaries since this is a write operation.
  3. Routing and keeping track of which nodes the M/R job is run on. This is because sharded map reduce is done in 2 stages:
    1. 1st stage: Run mapReduce on every shard.
    2. 2nd stage: Run mapReduce.shardedfinish on every shard. The 2nd stage involves aggregating the results from all other shards and running finalReduce on them.


 Comments   
Comment by Esha Bhargava [ 04/Feb/22 ]

Closing these tickets as part of the deprecation of mapReduce.

Comment by David Storch [ 17/Oct/19 ]

The query team should re-triage this. It sounds like the new agg-based implementation for MR will address this ticket.

Generated at Thu Feb 08 03:18:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.