Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30708

_id index returning more than one document with same _id in aggregations and counts.

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.4.6
    • Labels:
      None
    • Environment:
      Ubuntu 14.04
    • ALL
    • Hide

      I don't know. This was found by one of my employees and I don't know how to explain to my customers or even to my colleagues.

      Show
      I don't know. This was found by one of my employees and I don't know how to explain to my customers or even to my colleagues.

      I've found a strange behavior in one of my sharded clusters.

      I will resume this report because I really don't know how to explain this, I just need to know what kind of data is needed to debug this behavior and help to fix this asap, because it is impacting some of my customers.

      Basically, I have a sharded cluster UNDER PRODUCTION with 7 nodes with replication factor of 3, 6 mongos and 3 config servers.

      All those nodes were created using the binary version 3.4.4 but they already passed though different versions (trying to escape of another bug): 3.4.0, 3.4.3 and 3.4.6. Currently on version 3.4.6

      My shard key is composed by a hashed index on a custom string field named "pid". This field is basically one object identification but can repeat for a thousand times, for example.

      My database has 500 millions documents.

      All insertions in this database are performed by applications using java driver version 3.4.2. We had some old applications using java driver 3.2.2 many months ago, but this cluster was created just a month ago and I really don't believe those applications added any documents in this cluster. The above example is about one document added to the database with a java driver 3.4.2.

      Explaining the problem with commands:

      This find on mongo shell returns only one document:

      mongos> db.investigation_cards.find({_id : ObjectId("5988e4ea8584c230ad486e43")}, {_id : 1, pid : 1})
      { "_id" : ObjectId("5988e4ea8584c230ad486e43"), "pid" : "10155585763579589_1923551154567182" }
      

      This count, otherwise, returns two documents in mongo shell:

      mongos> db.investigation_cards.count({_id : ObjectId("5988e4ea8584c230ad486e43")})
      2
      

      What is really impacting my customers is that if I execute the aggregation showed below in my application, some documents are returned twice. In one MongoDB GUI we can see the same result but in others the document doesn't returns twice (pictures aggregation_*).

      db.investigation_cards.aggregate(
      [
      	{ "$match" : { _id : ObjectId("5988e4ea8584c230ad486e43")}},
      	{ "$project" : { _id : 1, "pid" : 1}},
      ]
      )
      

        1. aggregation_gui1.png
          aggregation_gui1.png
          11 kB
        2. aggregation_gui2.png
          aggregation_gui2.png
          17 kB
        3. aggregation_shell.png
          aggregation_shell.png
          6 kB
        4. explain_primary_aggregation.json
          35 kB
        5. explain_primary_find.json
          43 kB
        6. explain_secondary_aggregation.json
          32 kB
        7. explain_secondary_find.json
          30 kB
        8. queries.7z
          6 kB
        9. sh.status.tar.gz
          4.43 MB

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            lucasoares Lucas
            Votes:
            3 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: