Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.1
Component/s: MapReduce
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Run mongo < test.js on a sharded cluster twice.

Original steps

Create a sharded input collection.

Execute a map reduce with sharded output and be sure that the output collection has more than one chunk.

Repeat the execution with more data in the input collection in order to make the output grow but also to have results with the same key.
Show
Run mongo < test.js on a sharded cluster twice. Original steps Create a sharded input collection. Execute a map reduce with sharded output and be sure that the output collection has more than one chunk. Repeat the execution with more data in the input collection in order to make the output grow but also to have results with the same key.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There is an issue when using map reduce with sharded output using merge mode.

If there is more than one chunk in the output collection and some of the map reduce values have a key already stored in the result collection, the map reduce fails stating:
"exception: insertDocument :: caused by :: 11000 E11000 duplicate key error index"

At first I thought it might be because I was using the same collection as input and as output. But it also happens when using different collections.

This doesn't happen if the output collection is unsharded or if it only has one chunk.

The map reduce was executed through mongo and also through pymongo with the same behavior.

This bug might not happen the first time you execute a map reduce on the collection with already stored keys. But after several executions that make the output collection grow and get divided into more chunks the bug shows up.

I haven't tried what happens when the input collection is not sharded.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

test.js
0.5 kB
Jul 25 2014 05:46:01 PM UTC

duplicates

SERVER-7926 Map Reduce with sharded output can apply reduce on duplicate documents if a migration happened

Closed

is related to

SERVER-15024 mapReduce output to sharded collection leaves orphans and then uses them in subsequent map/reduce

Closed

Assignee:: Randolph Tan
Reporter:: Santiago Alessandri
Participants:: Ramon Fernandez Marina, Randolph Tan, Santiago Alessandri, Serge A Terekhov [X]
Votes:: 4 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Jun 16 2014 05:22:37 PM UTC
Updated:: Jun 25 2015 06:39:36 PM UTC
Resolved:: Aug 27 2014 06:44:27 PM UTC

Details

Original steps

Description

Attachments

Attachments

Issue Links

Activity

People

Dates