Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13041

count queries that can be index only are slower in 2.6

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.6.0-rc0
    • Fix Version/s: 2.6.0-rc2
    • Component/s: Performance, Querying
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Create a 10M doc collection with 3 columns (_id, c1, c2). Values inserted were from (0,0,0), (1,1,1), .... to (9999999, 9999999, 9999999).

      import pymongo
      import os
      import time
       
      def f(id, nrows):
          print 'hello', id, nrows
          client = pymongo.Connection(w=0)
          db = client.i
          coll = db.i
          for batch in xrange(nrows / 10000):
            docs = []
            for x in xrange(10000):
              row = (batch * 10000) + x
              docs.append({"_id": row, "c1":row, "c2":row})
            coll.insert(docs)
            print 'inserted ', batch
       
      if __name__ == '__main__':
          f(1, 10000000)

      Then create an index on c1...

      db.i.ensureIndex({c1:1})

      Then run these queries...

      use i
      db.i.find().count()
      db.i.find({ _id : { $gt : 4000000 } }).count()
      db.i.find({ c1 : { $gt : 4000000 } }).count()
      db.i.find({ c1 : { $gt : 4000000 } }).hint({ c1 : 1 }).count()
      db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).count()
      db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).hint({ c1 : 1 }).count()
       
      db.i.find({ c2 : { $gt : 4000000 } }).count()
      db.i.find({ c2 : { $gt : 4000000 } }, { _id:0 }).count()

      then look at the mongod error log for slow query results.

      Show
      Create a 10M doc collection with 3 columns (_id, c1, c2). Values inserted were from (0,0,0), (1,1,1), .... to (9999999, 9999999, 9999999). import pymongo import os import time   def f(id, nrows): print 'hello', id, nrows client = pymongo.Connection(w=0) db = client.i coll = db.i for batch in xrange(nrows / 10000): docs = [] for x in xrange(10000): row = (batch * 10000) + x docs.append({"_id": row, "c1":row, "c2":row}) coll.insert(docs) print 'inserted ', batch   if __name__ == '__main__': f(1, 10000000) — Then create an index on c1... db.i.ensureIndex({c1:1}) — Then run these queries... use i db.i.find().count() db.i.find({ _id : { $gt : 4000000 } }).count() db.i.find({ c1 : { $gt : 4000000 } }).count() db.i.find({ c1 : { $gt : 4000000 } }).hint({ c1 : 1 }).count() db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).count() db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).hint({ c1 : 1 }).count()   db.i.find({ c2 : { $gt : 4000000 } }).count() db.i.find({ c2 : { $gt : 4000000 } }, { _id:0 }).count() — then look at the mongod error log for slow query results.

      Description

      Count queries that can be index only are slower in 2.6 rc0 than in 2.4.9. For a 10M doc collection with an index on c1 I ran these queries. Queries 1,2 can use the PK on _id. Queries 2,3,4,5 can use the index on c1. Queries 6,7 should do a full scan.

      1) db.i.find({ _id : { $gt : 4000000 } }).count()
      2) db.i.find({ c1 : { $gt : 4000000 } }).count()
      3) db.i.find({ c1 : { $gt : 4000000 } }).hint({ c1 : 1 }).count()
      4) db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).count()
      5) db.i.find({ c1 : { $gt : 4000000 } }, { _id:0 }).hint({ c1 : 1 }).count()
      6) db.i.find({ c2 : { $gt : 4000000 } }).count()
      7) db.i.find({ c2 : { $gt : 4000000 } }, { _id:0 }).count()

      I ran the set of queries twice and looked at times for the second run. All data is cached in RAM.

      For 2.4.9 the times are ~750 miillis each for the first 5 queries and then ~2.4 seconds for the last 2 queries.

      For 2.6 rc0 the times are ~2 seconds each for the first 5 queries and then ~2.9 seconds for the last 2 queries that should do a full scan. The index-only count queries are much slower in 2.6. Tull scan count queries are also slower (2.9 vs 2.4 seconds) but the difference is less significant.

      Alas, I can't do find().count().explain()

      Sorry for mixing javascript and python, I am new to javascript. I also don't have the latest version of pymongo.

        Attachments

        1. c2.js
          0.4 kB
        2. i.py
          0.4 kB

          Activity

            People

            Assignee:
            hari.khalsa@10gen.com hari.khalsa@10gen.com
            Reporter:
            mdcallag Mark Callaghan
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: