Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14269

Map Reduce fails with duplicate-key when output is 'merge' and sharded.

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.1
    • Component/s: MapReduce
    • None
    • ALL
    • Hide

      Run mongo < test.js on a sharded cluster twice.

      Original steps
      • Create a sharded input collection.
      • Execute a map reduce with sharded output and be sure that the output collection has more than one chunk.
      • Repeat the execution with more data in the input collection in order to make the output grow but also to have results with the same key.
      Show
      Run mongo < test.js on a sharded cluster twice. Original steps Create a sharded input collection. Execute a map reduce with sharded output and be sure that the output collection has more than one chunk. Repeat the execution with more data in the input collection in order to make the output grow but also to have results with the same key.

      There is an issue when using map reduce with sharded output using merge mode.

      If there is more than one chunk in the output collection and some of the map reduce values have a key already stored in the result collection, the map reduce fails stating:
      "exception: insertDocument :: caused by :: 11000 E11000 duplicate key error index"

      At first I thought it might be because I was using the same collection as input and as output. But it also happens when using different collections.

      This doesn't happen if the output collection is unsharded or if it only has one chunk.

      The map reduce was executed through mongo and also through pymongo with the same behavior.

      This bug might not happen the first time you execute a map reduce on the collection with already stored keys. But after several executions that make the output collection grow and get divided into more chunks the bug shows up.

      I haven't tried what happens when the input collection is not sharded.

        1. test.js
          0.5 kB
          Ramon Fernandez Marina

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            salessandri Santiago Alessandri
            Votes:
            4 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: