Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2649

Poor remove() performance (tested for pymongo only)

    • Type: Icon: Improvement Improvement
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.7.6
    • Component/s: Performance
    • None
    • Environment:
      Ubuntu 10.10 (amd64)
    • Storage Execution

      1. In my original case I have issued the command c.db.coll.remove(

      { 'str_key' : 'myvalue'}

      ) that resulted in removal of about 200000 records from a collection containing 2000000 records. The removal of these 10% of records took 1 hour. The field str_key was indexed. Each record size was about 10 Kbytes. About 13Gbytes of RAM not used by any application and ready for mongodb mmap.

      2. I tried to check how remove() works in oversimplified case (see below). The insertions took 312 sec, whereas the removal 994sec. So, removals are much slower, whereas no much serialized data are pushed between server and a client.

      3. see also:
      http://groups.google.com/group/mongodb-user/browse_thread/thread/95f9386cd57003e4
      http://groups.google.com/group/mongodb-user/browse_thread/thread/5d5dd12e37382b5b
      http://groups.google.com/group/mongodb-user/browse_thread/thread/5a7033248bbe362d

      (4. "limit=100000" doesn't seem to work in the example below, but this is not important, I hope. Also, perhaps, I should have put safe=True for more transparency. Finally, in my case it was a "cold" I/O – the records have not been prefetched from disk into RAM before the remove() has been invoked. If it is all still very important, I could try to reproduce my issue again but a bit closer, but IMHO the case below still shows well enough an unexpected slow-down)

      ####################################
      import unittest
      from debug.decorators import timeit

      class Test(unittest.TestCase):
      def testName(self):
      from pymongo import Connection
      c = Connection()
      dummy_str = 'a' * 10000
      c.drop_database('test_remove_performance')

      @timeit
      def insert_many():
      for i in range(1000000):
      c.test_remove_performance.coll.insert(

      {'dummy_str' : dummy_str }

      )
      @timeit
      def remove_some():
      c.test_remove_performance.coll.remove({}, limit=100000)

      insert_many()
      remove_some()
      print c.test_remove_performance.coll.count()

      if _name_ == "_main_":
      unittest.main()

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            vak Valery Khamenya
            Votes:
            10 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: