-
Type: Improvement
-
Resolution: Incomplete
-
Priority: Major - P3
-
None
-
Affects Version/s: 1.7.6
-
Component/s: Performance
-
None
-
Environment:Ubuntu 10.10 (amd64)
-
Storage Execution
1. In my original case I have issued the command c.db.coll.remove(
{ 'str_key' : 'myvalue'}) that resulted in removal of about 200000 records from a collection containing 2000000 records. The removal of these 10% of records took 1 hour. The field str_key was indexed. Each record size was about 10 Kbytes. About 13Gbytes of RAM not used by any application and ready for mongodb mmap.
2. I tried to check how remove() works in oversimplified case (see below). The insertions took 312 sec, whereas the removal 994sec. So, removals are much slower, whereas no much serialized data are pushed between server and a client.
3. see also:
http://groups.google.com/group/mongodb-user/browse_thread/thread/95f9386cd57003e4
http://groups.google.com/group/mongodb-user/browse_thread/thread/5d5dd12e37382b5b
http://groups.google.com/group/mongodb-user/browse_thread/thread/5a7033248bbe362d
(4. "limit=100000" doesn't seem to work in the example below, but this is not important, I hope. Also, perhaps, I should have put safe=True for more transparency. Finally, in my case it was a "cold" I/O – the records have not been prefetched from disk into RAM before the remove() has been invoked. If it is all still very important, I could try to reproduce my issue again but a bit closer, but IMHO the case below still shows well enough an unexpected slow-down)
####################################
import unittest
from debug.decorators import timeit
class Test(unittest.TestCase):
def testName(self):
from pymongo import Connection
c = Connection()
dummy_str = 'a' * 10000
c.drop_database('test_remove_performance')
@timeit
def insert_many():
for i in range(1000000):
c.test_remove_performance.coll.insert(
)
@timeit
def remove_some():
c.test_remove_performance.coll.remove({}, limit=100000)
insert_many()
remove_some()
print c.test_remove_performance.coll.count()
if _name_ == "_main_":
unittest.main()