Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-1694

Corruption during mapreduce on documents with arrays of binary data.

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 1.7.0
    • Affects Version/s: 1.6.1
    • None
    • Environment:
    • Linux

      Running mapreduce on an input collection that only has 2 different keys (0x00010203 and 0xaabbccdd) results in random keys in the output collection:

      cdc$ ./mr
      { "result" : "out", "timeMillis" : 9, "counts" :

      { "input" : 20, "emit" : 40, "output" : 5 }

      , "ok" : 1 }
      key: 00102030
      key: 16000000
      key: 337f0000
      key: 80ad142c
      key: aabbccdd
      cdc$ ./mr
      { "result" : "out", "timeMillis" : 9, "counts" :

      { "input" : 20, "emit" : 40, "output" : 6 }

      , "ok" : 1 }
      key: 00000000
      key: 00102030
      key: 16000000
      key: 337f0000
      key: 80bc0f2c
      key: aabbccdd
      cdc$ ./mr
      { "result" : "out", "timeMillis" : 9, "counts" :

      { "input" : 20, "emit" : 40, "output" : 6 }

      , "ok" : 1 }
      key: 00102030
      key: 16000000
      key: 337f0000
      key: 7071122c
      key: 803c122c
      key: aabbccdd
      cdc$

      Steps to reproduce:

      1) insert a few documents with arrays of binary data
      2) run mapreduce that emits bindata from said arrays as keys
      3) notice more keys in output collection than exist in input collection, some are correct, some are random

      I'm quite sure the corruption happens during the mapreduce operation, the bindata arrays are stored correctly in the db itself. However, the mongo js shell has some trouble interpreting it consistently:

      > db.in.findOne(

      {i : 0}

      ,

      {k : 1}

      )
      {
      "_id" : ObjectId("4c77f6c1ec4e2dee21a74d9a"),
      "k" : [
      BinData(0,"/38AAA=="),
      BinData(0,"YLzjAQ==")
      ]
      }
      > db.in.findOne(

      {i : 0}

      ,

      {k : 1}

      )
      {
      "_id" : ObjectId("4c77f6c1ec4e2dee21a74d9a"),
      "k" : [
      BinData(0,"AAAAAA=="),
      BinData(0,"awDM3Q==")
      ]
      }

      Again, when accessing the k array via the mongoclient library, the data is NOT corrupted, this bug in the js client could be completely unrelated.

      I've attached mr.cpp that reproduces the behaviour.

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            gvs gvs
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: