[SERVER-10154] Suppress replication of temporary collections used for mapReduce Created: 10/Jul/13  Updated: 19/Feb/16  Resolved: 19/Sep/14

Status: Closed
Project: Core Server
Component/s: MapReduce, Replication
Affects Version/s: 2.2.6, 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hiroaki Assignee: Unassigned
Resolution: Done Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Issue Links:
Related
is related to SERVER-13981 Temporary map/reduce collections are ... Closed
Operating System: ALL
Participants:

 Description   

Temp collection of mapReduce (such as below) are replicated nevertheless it's useless moreover so frequently.

  • "<db name>.tmp.mr.<collection name>_1"
  • "<db name>.tmp.mr.<collection name>_1_inc"

This issue causes big transaction so performance is quite deteriorated.

The primary node will hangup in many case.

We can confirm this replication by looking local.rs.oplog.



 Comments   
Comment by Eric Milkie [ 19/Sep/14 ]

Certain temp collections from map/reduce are renamed into permanent collections; these must be replicated.
Other temp collections are transient; these are not currently replicated in the master branch.

Comment by Gianfranco Palumbo [ 18/Sep/13 ]

This happens on 2.2.6 as well 2.4.6:

>PRIMARY
 
db.myoutput.drop();
db.things.drop();
 
db.things.insert( { _id : 1, tags : ['dog', 'cat'] } );
db.things.insert( { _id : 2, tags : ['cat'] } );
db.things.insert( { _id : 3, tags : ['mouse', 'cat', 'dog'] } );
db.things.insert( { _id : 4, tags : []  } );
 
 
map = function() {
  things = this;
  things.tags.forEach( function(elem) {
      emit( elem , { count : 1 } );
    }
  );
};
 
reduce = function( key , values ) {
  var total = 0;
  for ( var i=0; i<values.length; i++ )
    total += values[i].count;
  return { count : total };
};
 
 
res = db.things.mapReduce(map, reduce, { out : "myoutput" } );

Oplog

[PRIMARY] test> db.getSiblingDB("local").oplog.rs.find()
{ "ts": Timestamp(1379494215000, 1), "h": NumberLong("-8029276837284782194"), "v": 2, "op": "i", "ns": "test.things", "o": { "_id": 1, "tags": [ "dog", "cat" ] } }
{ "ts": Timestamp(1379494215000, 2), "h": NumberLong("2184773576383086701"), "v": 2, "op": "i", "ns": "test.things", "o": { "_id": 2, "tags": [ "cat" ] } }
{ "ts": Timestamp(1379494215000, 3), "h": NumberLong("4039601930849973355"), "v": 2, "op": "i", "ns": "test.things", "o": { "_id": 3, "tags": [ "mouse", "cat", "dog" ] } }
{ "ts": Timestamp(1379494215000, 4), "h": NumberLong("2671674817666989846"), "v": 2, "op": "i", "ns": "test.things", "o": { "_id": 4, "tags": [ ] } }
{ "ts": Timestamp(1379494215000, 5), "h": NumberLong("10765921238731960"), "v": 2, "op": "i", "ns": "test.system.indexes", "o": { "ns": "test.tmp.mr.things_0_inc", "key": { "0": 1 }, "name": "0_1" } }
{ "ts": Timestamp(1379494215000, 6), "h": NumberLong("-4428498763725511409"), "v": 2, "op": "c", "ns": "test.$cmd", "o": { "create": "tmp.mr.things_0", "temp": true } }
{ "ts": Timestamp(1379494215000, 7), "h": NumberLong("-3216128670775827979"), "v": 2, "op": "i", "ns": "test.tmp.mr.things_0", "o": { "_id": "cat", "value": { "count": 3 } } }
{ "ts": Timestamp(1379494215000, 8), "h": NumberLong("3507433566574634712"), "v": 2, "op": "i", "ns": "test.tmp.mr.things_0", "o": { "_id": "dog", "value": { "count": 2 } } }
{ "ts": Timestamp(1379494215000, 9), "h": NumberLong("-1875651706100864606"), "v": 2, "op": "i", "ns": "test.tmp.mr.things_0", "o": { "_id": "mouse", "value": { "count": 1 } } }
....

Secondary log:

Wed Sep 18 10:27:06 [repl writer worker 1] build index test.tmp.mr.things_0_inc { _id: 1 }
Wed Sep 18 10:27:06 [repl writer worker 1] build index done.  scanned 0 total records. 0 secs
Wed Sep 18 10:27:06 [repl writer worker 1] info: creating collection test.tmp.mr.things_0_inc on add index
Wed Sep 18 10:27:06 [repl writer worker 1] build index test.tmp.mr.things_0_inc { 0: 1 }
Wed Sep 18 10:27:06 [repl writer worker 1] build index done.  scanned 0 total records. 0 secs
Wed Sep 18 10:27:06 [repl writer worker 1] build index test.tmp.mr.things_0 { _id: 1 }
Wed Sep 18 10:27:06 [repl writer worker 1] build index done.  scanned 0 total records. 0 secs
Wed Sep 18 10:27:06 [repl writer worker 1] CMD: drop test.tmp.mr.things_0_inc

Comment by David Hows [ 09/Sep/13 ]

Hi Hiroaki Kubota,

MongoDB routinely renames these temporary collections into the final result.

Here is an example from my own execution of a small map/reduce.

{ "ts" : Timestamp(1378711038, 14), "h" : NumberLong("712068168650850520"), "v" : 2, "op" : "c", "ns" : "admin.$cmd", "o" : { "renameCollection" : "test.tmp.mr.mapreduce_0", "to" : "test.mrresult", "stayTemp" : false } }

Can you attach some of the oplog entries you mentioned earlier? And attach logs showing the delays and performance deterioration?

Thanks,
David

Comment by David Verdejo [ 11/Jul/13 ]

I have the same issue

Thu Jul 11 10:22:03.262 [repl writer worker 3] CMD: drop aggregate.tmp.mr.Statistics_FlightsBookingEngine_288732
Thu Jul 11 10:22:03.262 [repl writer worker 3] CMD: drop aggregate.tmp.mr.Statistics_FlightsBookingEngine_288732_inc
Thu Jul 11 10:22:03.262 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_FlightsBookingEngine_288733_inc

{ _id: 1 }

Thu Jul 11 10:22:03.262 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
Thu Jul 11 10:22:03.262 [repl writer worker 2] info: creating collection aggregate.tmp.mr.Statistics_FlightsBookingEngine_288733_inc on add index
Thu Jul 11 10:22:03.262 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_FlightsBookingEngine_288733_inc

{ 0: 1 }

Thu Jul 11 10:22:03.262 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
Thu Jul 11 10:22:03.262 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_FlightsBookingEngine_288733

{ _id: 1 }

Thu Jul 11 10:22:03.262 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
Thu Jul 11 10:22:03.262 [repl writer worker 2] info: indexing in foreground on this replica; was a background index build on the primary
Thu Jul 11 10:22:03.262 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_FlightsBookingEngine_288733

{ _id.f: 1.0, _id.org: 1.0, _id.dst: 1.0, _id.dpt: 1.0, _id.dur: 1.0, _id.res: 1.0 }

Thu Jul 11 10:22:03.262 [repl writer worker 2] build index done. scanned 0 total records. 0 secs

I think that one solution could be to créate a temporary database on primary and créate temporary mapreduce tables on it and then avoid to replicate to the replica.

Generated at Thu Feb 08 03:22:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.