Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3645

Sharded collection counts (on primary) can report too many results

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: planned but not scheduled
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Summary

      Count does not filter out unowned (orphaned) documents and can therefore report larger values than one will find via a normal query, or using itcount() in the shell.

      Causes

      The following conditions can lead to counts being off:

      • Active migrations
      • Orphaned documents (left from failed migrations)
      • Non-Primary read preferences (see SERVER-5931)

      Workaround

      A workaround to get accurate counts is to ensure all migrations have been cleaned up and no migrations are active. To query non-primaries you must also ensure that there is no replication lag including any migration data, in addition to the above requirements.

      Non-Primary Reads

      For issues with counts/reads from non-primaries please see SERVER-5931

        Issue Links

          Activity

          Hide
          srinivas.mutyala@citi.com Srinivas Mutyala added a comment -

          Additional issues with Sharded set-up & when it's sharding.

          1) After the MongoDB Initial sync, the number of documents in the individual shards are nearly equal – but not exactly.

          2) During the sync in progress, if we query the total number of documents thru the mongos(router) we’re seeing indefinite results. Ideally, it should be same number all the time.If the issue is with count() is fine, but what about the data consistency.

          3) Initial sync time is directly proportional to the total data size and very slower. Need a fix for it.

          Show
          srinivas.mutyala@citi.com Srinivas Mutyala added a comment - Additional issues with Sharded set-up & when it's sharding. 1) After the MongoDB Initial sync, the number of documents in the individual shards are nearly equal – but not exactly. 2) During the sync in progress, if we query the total number of documents thru the mongos(router) we’re seeing indefinite results. Ideally, it should be same number all the time.If the issue is with count() is fine, but what about the data consistency. 3) Initial sync time is directly proportional to the total data size and very slower. Need a fix for it.
          Hide
          sflint@sailthru.com sam flint added a comment -

          You can use explain() to capture the correct count and this is much faster than itcount(). We put this in our client side application to call explain().n
          As you can see it is accurate and it is faster than itcount().
          "cursor" : "BtreeCursor client_id_1_lists_1_order_1",
          "n" : 5487153,
          "nChunkSkips" : 17072,
          "nYields" : 11907,
          "nscanned" : 5672905,
          "nscannedAllPlans" : 5672905,
          "nscannedObjects" : 5672905,
          "nscannedObjectsAllPlans" : 5672905,
          "millisShardTotal" : 69749,
          "millisShardAvg" : 9964,
          "numQueries" : 7,
          "numShards" : 7,
          "millis" : 18282
          }
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).count()
          5503724
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).itcount()
          5487153
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).explain().n
          5487153

          Show
          sflint@sailthru.com sam flint added a comment - You can use explain() to capture the correct count and this is much faster than itcount(). We put this in our client side application to call explain().n As you can see it is accurate and it is faster than itcount(). "cursor" : "BtreeCursor client_id_1_lists_1_order_1", "n" : 5487153, "nChunkSkips" : 17072, "nYields" : 11907, "nscanned" : 5672905, "nscannedAllPlans" : 5672905, "nscannedObjects" : 5672905, "nscannedObjectsAllPlans" : 5672905, "millisShardTotal" : 69749, "millisShardAvg" : 9964, "numQueries" : 7, "numShards" : 7, "millis" : 18282 } mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).count() 5503724 mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).itcount() 5487153 mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).explain().n 5487153
          Hide
          justanyone Kevin J. Rice added a comment -

          @Sam Flint - THANK YOU, that explain().n saves a bunch of time! Great hint!

          Show
          justanyone Kevin J. Rice added a comment - @Sam Flint - THANK YOU, that explain().n saves a bunch of time! Great hint!
          Hide
          jonhyman Jon Hyman added a comment -

          Do you know if this is going to make it into 2.8? We have a 6 hour balancer window and our counts can be wrong during 25% of the day due to it.

          Show
          jonhyman Jon Hyman added a comment - Do you know if this is going to make it into 2.8? We have a 6 hour balancer window and our counts can be wrong during 25% of the day due to it.
          Hide
          digi604 Patrick Lauber added a comment -

          this bug cost me almost a week as i tried to understand why my import numbers differed from my source...

          Show
          digi604 Patrick Lauber added a comment - this bug cost me almost a week as i tried to understand why my import numbers differed from my source...

            People

            • Votes:
              34 Vote for this issue
              Watchers:
              58 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                8 weeks, 1 day ago
                Date of 1st Reply: