Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17426

Aggregation framework query by _id returns duplicates in sharded cluster (orphan documents)

    • Fully Compatible
    • ALL
    • Hide
      mlaunch --sharded 2 --config 1 --single --binarypath <bin_version>
      
      use duptest
      db.dup.insert({_id: 2, a: 2, b: 2})
      db.dup.insert({_id: 1, a: 1, b: 1})
      db.dup.ensureIndex({a: 1})
      sh.enableSharding("duptest")
      sh.shardCollection("duptest.dup", {a: 1})
      
      sh.splitAt("duptest.dup", {a: 2})
      
      # If needed, perform moveChunk.
      sh.moveChunk("duptest.dup", {a: 2}, "shard02")
      
      
      # After split the MaxKey collection should be having the a = 2 document, on that shard add the document to have effect of orphan document
      db.dup.insert({_id: 1, a: 1, b: 12})
      
      # This results in two records while ideally it should be only 1
      db.dup.aggregate([{$match: {_id: 1} }])
      
      # This is expected to return 2 by its current design
      db.dup.count({_id: 1})
      
      Show
      mlaunch --sharded 2 --config 1 --single --binarypath <bin_version> use duptest db.dup.insert({_id: 2, a: 2, b: 2}) db.dup.insert({_id: 1, a: 1, b: 1}) db.dup.ensureIndex({a: 1}) sh.enableSharding("duptest") sh.shardCollection("duptest.dup", {a: 1}) sh.splitAt("duptest.dup", {a: 2}) # If needed, perform moveChunk. sh.moveChunk("duptest.dup", {a: 2}, "shard02") # After split the MaxKey collection should be having the a = 2 document, on that shard add the document to have effect of orphan document db.dup.insert({_id: 1, a: 1, b: 12}) # This results in two records while ideally it should be only 1 db.dup.aggregate([{$match: {_id: 1} }]) # This is expected to return 2 by its current design db.dup.count({_id: 1})

      In sharded cluster aggregation by _id alone results in duplicate records in case of orphan records exists on the other shards.

      If the record with _id = 1 exists on the shard A (active) and shard B (orphan), the following aggregate will return 2 records.

      db.test.aggregate([{$match: {_id: 1} }])
      

      It just happens that this is the suggested way of getting an accurate (along with $group stage) count in sharded cluster in the documentation.

      The incorrect behaviour is not noticed in 2.4.x and 3.0.0-rc11.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            anil.kumar Anil Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: