Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3645

Sharded collection counts (on primary) can report too many results

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.1 Desired
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Driver Changes:
      Not Needed

      Description

      Summary

      Count does not filter out unowned (orphaned) documents and can therefore report larger values than one will find via a normal query, or using itcount() in the shell.

      Causes

      The following conditions can lead to counts being off:

      • Active migrations
      • Orphaned documents (left from failed migrations)
      • Non-Primary read preferences (see SERVER-5931)

      Workaround

      A workaround to get accurate counts is to ensure all migrations have been cleaned up and no migrations are active. To query non-primaries you must also ensure that there is no replication lag including any migration data, in addition to the above requirements.

      Non-Primary Reads

      For issues with counts/reads from non-primaries please see SERVER-5931

        Issue Links

          Activity

          Hide
          asya Asya Kamsky added a comment -

          This ticket is tracking count() - which uses metadata for collection to quickly get the total count of documents.

          There is a different ticket tracking the fact that when you query secondaries with broadcast query (i.e. untargeted, not involving the shard key) and there is either migration in progress or orphan documents left from an aborted migration, the secondary doesn't know to filter them out the way the primary would. That ticket is https://jira.mongodb.org/browse/SERVER-5931 - the workaround of reading from primaries when using non-targeted queries will work for you. If you are using targeted queries (one with the shard key) then this should be a problem whether you are on primaries or secondaries.

          Show
          asya Asya Kamsky added a comment - This ticket is tracking count() - which uses metadata for collection to quickly get the total count of documents. There is a different ticket tracking the fact that when you query secondaries with broadcast query (i.e. untargeted, not involving the shard key) and there is either migration in progress or orphan documents left from an aborted migration, the secondary doesn't know to filter them out the way the primary would. That ticket is https://jira.mongodb.org/browse/SERVER-5931 - the workaround of reading from primaries when using non-targeted queries will work for you. If you are using targeted queries (one with the shard key) then this should be a problem whether you are on primaries or secondaries.
          Hide
          srinivas.mutyala@citi.com Srinivas Mutyala added a comment -

          Additional issues with Sharded set-up & when it's sharding.

          1) After the MongoDB Initial sync, the number of documents in the individual shards are nearly equal – but not exactly.

          2) During the sync in progress, if we query the total number of documents thru the mongos(router) we’re seeing indefinite results. Ideally, it should be same number all the time.If the issue is with count() is fine, but what about the data consistency.

          3) Initial sync time is directly proportional to the total data size and very slower. Need a fix for it.

          Show
          srinivas.mutyala@citi.com Srinivas Mutyala added a comment - Additional issues with Sharded set-up & when it's sharding. 1) After the MongoDB Initial sync, the number of documents in the individual shards are nearly equal – but not exactly. 2) During the sync in progress, if we query the total number of documents thru the mongos(router) we’re seeing indefinite results. Ideally, it should be same number all the time.If the issue is with count() is fine, but what about the data consistency. 3) Initial sync time is directly proportional to the total data size and very slower. Need a fix for it.
          Hide
          sflint@sailthru.com sam flint added a comment -

          You can use explain() to capture the correct count and this is much faster than itcount(). We put this in our client side application to call explain().n
          As you can see it is accurate and it is faster than itcount().
          "cursor" : "BtreeCursor client_id_1_lists_1_order_1",
          "n" : 5487153,
          "nChunkSkips" : 17072,
          "nYields" : 11907,
          "nscanned" : 5672905,
          "nscannedAllPlans" : 5672905,
          "nscannedObjects" : 5672905,
          "nscannedObjectsAllPlans" : 5672905,
          "millisShardTotal" : 69749,
          "millisShardAvg" : 9964,
          "numQueries" : 7,
          "numShards" : 7,
          "millis" : 18282
          }
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).count()
          5503724
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).itcount()
          5487153
          mongos> db.profile.find(

          {client_id : 3762}

          ,

          {client_id:1}

          ).explain().n
          5487153

          Show
          sflint@sailthru.com sam flint added a comment - You can use explain() to capture the correct count and this is much faster than itcount(). We put this in our client side application to call explain().n As you can see it is accurate and it is faster than itcount(). "cursor" : "BtreeCursor client_id_1_lists_1_order_1", "n" : 5487153, "nChunkSkips" : 17072, "nYields" : 11907, "nscanned" : 5672905, "nscannedAllPlans" : 5672905, "nscannedObjects" : 5672905, "nscannedObjectsAllPlans" : 5672905, "millisShardTotal" : 69749, "millisShardAvg" : 9964, "numQueries" : 7, "numShards" : 7, "millis" : 18282 } mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).count() 5503724 mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).itcount() 5487153 mongos> db.profile.find( {client_id : 3762} , {client_id:1} ).explain().n 5487153
          Hide
          justanyone Kevin J. Rice added a comment -

          @Sam Flint - THANK YOU, that explain().n saves a bunch of time! Great hint!

          Show
          justanyone Kevin J. Rice added a comment - @Sam Flint - THANK YOU, that explain().n saves a bunch of time! Great hint!
          Hide
          jonhyman Jon Hyman added a comment -

          Do you know if this is going to make it into 2.8? We have a 6 hour balancer window and our counts can be wrong during 25% of the day due to it.

          Show
          jonhyman Jon Hyman added a comment - Do you know if this is going to make it into 2.8? We have a 6 hour balancer window and our counts can be wrong during 25% of the day due to it.

            People

            • Votes:
              29 Vote for this issue
              Watchers:
              56 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                15 weeks, 5 days ago
                Date of 1st Reply: