[SERVER-13981] Temporary map/reduce collections are incorrectly replicated to secondaries Created: 18/May/14  Updated: 11/Jun/15  Resolved: 20/May/14

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.5.5, 2.6.0, 2.6.1
Fix Version/s: 2.6.2, 2.7.1

Type: Bug Priority: Critical - P2
Reporter: Linda Qin Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-10154 Suppress replication of temporary col... Closed
related to SERVER-14168 Warning logged when incremental MR co... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of May 27, 2014

ISSUE SUMMARY
With the introduction of 2.6, certain temporary map/reduce collections have incorrectly been replicated to secondary nodes. This adds additional traffic between replica set nodes. Additionally, these collections do not have an _id value in their documents, which causes scanning of collections during replication on the primary and can impact performance.

USER IMPACT
Large map/reduce jobs with millions of documents can noticeably impact the performance of the server, increase oplog churn and thus network traffic between replica set members.

WORKAROUNDS
There is no workaround for replicating inserts to the temporary collections. If the impact to the server increases to intolerable levels, the m/r job should be moved to a dedicated hidden secondary node to mitigate the issue.

AFFECTED VERSIONS
MongoDB 2.6.0 and 2.6.1 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.2 production release.

RESOLUTION DETAILS
Documents in temporary *_inc collections are explicitly not replicated. This restores the behavior prior to development version 2.5.5.

Original Description

Run the map reduce example on a 2.6 replica set.

From the oplog, I can see that mapReduce generates tmp collections <database.tmp.mr.collection_x_inc> without _id field. This would cause performance issue when it tried to replicate these tmp collections on the secondaries.

> db.oplog.rs.find({ns:/_inc/})
{ "ts" : Timestamp(1400390715, 1), "h" : NumberLong("9062785211345050513"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 36, "1" : 256193 } }
{ "ts" : Timestamp(1400390715, 2), "h" : NumberLong("-6347065931779322235"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 1, "1" : 237298 } }
{ "ts" : Timestamp(1400390715, 3), "h" : NumberLong("5305159503718125362"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 2, "1" : 247543 } }
{ "ts" : Timestamp(1400390715, 4), "h" : NumberLong("242292647194800186"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 3, "1" : 246875 } }
{ "ts" : Timestamp(1400390715, 5), "h" : NumberLong("-3801567793714329373"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 4, "1" : 250808 } }
{ "ts" : Timestamp(1400390715, 6), "h" : NumberLong("-7467661728084668641"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 5, "1" : 266786 } }
{ "ts" : Timestamp(1400390715, 7), "h" : NumberLong("48771082501147428"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 6, "1" : 239294 } }
{ "ts" : Timestamp(1400390715, 8), "h" : NumberLong("7947765402550217396"), "v" : 2, "op" : "i", "ns" : "test.tmp.mr.docs_0_inc", "o" : { "0" : 7, "1" : 246862 } }



 Comments   
Comment by Githook User [ 21/May/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-13981 MR inc collection should not be replicated

Introduced in commit 8416afb7c5724076b1231626f27f5198a5a2cce7. Prior to that,
the collection was not replicated.

(cherry picked from commit 65ca787cfe1c287641cd859a8c7cae9e6cbde7f0)
Branch: v2.6
https://github.com/mongodb/mongo/commit/7c75cf1da8fe396eb0fff8dc9c7365e539ece611

Comment by Ramon Fernandez Marina [ 20/May/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-13981 MR inc collection should not be replicated

Introduced in commit 8416afb7c5724076b1231626f27f5198a5a2cce7. Prior to that,
the collection was not replicated.
Branch: master
https://github.com/mongodb/mongo/commit/65ca787cfe1c287641cd859a8c7cae9e6cbde7f0

Comment by Asya Kamsky [ 18/May/14 ]

moved comments about changed renameCollection across DBs behavior into SERVER-13984

Generated at Thu Feb 08 03:33:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.