Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17426

Aggregation framework query by _id returns duplicates in sharded cluster (orphan documents)

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      mlaunch --sharded 2 --config 1 --single --binarypath <bin_version>
       
      use duptest
      db.dup.insert({_id: 2, a: 2, b: 2})
      db.dup.insert({_id: 1, a: 1, b: 1})
      db.dup.ensureIndex({a: 1})
      sh.enableSharding("duptest")
      sh.shardCollection("duptest.dup", {a: 1})
       
      sh.splitAt("duptest.dup", {a: 2})
       
      # If needed, perform moveChunk.
      sh.moveChunk("duptest.dup", {a: 2}, "shard02")
       
       
      # After split the MaxKey collection should be having the a = 2 document, on that shard add the document to have effect of orphan document
      db.dup.insert({_id: 1, a: 1, b: 12})
       
      # This results in two records while ideally it should be only 1
      db.dup.aggregate([{$match: {_id: 1} }])
       
      # This is expected to return 2 by its current design
      db.dup.count({_id: 1})

      Show
      mlaunch --sharded 2 --config 1 --single --binarypath <bin_version>   use duptest db.dup.insert({_id: 2, a: 2, b: 2}) db.dup.insert({_id: 1, a: 1, b: 1}) db.dup.ensureIndex({a: 1}) sh.enableSharding("duptest") sh.shardCollection("duptest.dup", {a: 1})   sh.splitAt("duptest.dup", {a: 2})   # If needed, perform moveChunk. sh.moveChunk("duptest.dup", {a: 2}, "shard02")     # After split the MaxKey collection should be having the a = 2 document, on that shard add the document to have effect of orphan document db.dup.insert({_id: 1, a: 1, b: 12})   # This results in two records while ideally it should be only 1 db.dup.aggregate([{$match: {_id: 1} }])   # This is expected to return 2 by its current design db.dup.count({_id: 1})

      Description

      In sharded cluster aggregation by _id alone results in duplicate records in case of orphan records exists on the other shards.

      If the record with _id = 1 exists on the shard A (active) and shard B (orphan), the following aggregate will return 2 records.

      db.test.aggregate([{$match: {_id: 1} }])
      

      It just happens that this is the suggested way of getting an accurate (along with $group stage) count in sharded cluster in the documentation.

      The incorrect behaviour is not noticed in 2.4.x and 3.0.0-rc11.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: