[SERVER-7092] Delete by query is really slow Created: 21/Sep/12  Updated: 08/Mar/13  Resolved: 28/Sep/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Steven Casey Assignee: Stephen Lee
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu 12.04,
Hardware: Intel(R) Xeon(R) CPU E5606 @ 2.13GHz', 2133MHz, 4 Core, Sockets: 2
RAM: 32230MB
RAID: Level 10 Disks: 4 Size: 1849GB Type: SATA


Attachments: Text File iostat_-xmdh_2.txt     Text File mongostat_server1.txt     Text File mongostat_server2.txt    
Operating System: Linux
Participants:

 Description   

Trying to delete approx 8m of a 21million collection, its taking ages & appears to be very slow.

I am running a js file locally on server (mongo localhost:27018 remove.js) that contains:
collection.remove({'dts':{'$gte':ISODate('2012-06-21T00:00:00Z')}});

I only have two indexes.
one on _id.
one on above date field & id.

Disk I/O is maxxed out at 100%, please see attached iostat file.

I have an MMS account, please let me know what further details you need if you can look at it, the server is currently running the delete operation.

I also have another shard that I am also doing a delete on, but instead trying out with
var c = collection.find({'dts':{'$gte':ISODate('2012-06-21T00:00:00Z')}},

{'_id':1}

);
and
c.forEach(function (id) { cursor.remove({_id: id._id});})

It's also maxxing out I/O at 100% with a very slow return on deletes/sec.

Is there a recommended way to delete a large quantity of items?
Am I doing something blatantly wrong?



 Comments   
Comment by Stephen Lee [ 28/Sep/12 ]

Steven, if you continue experience performance issues, please reopen this ticket or create a new one.

Comment by Stephen Lee [ 21/Sep/12 ]

I looked at MMS, and the lock % looks pretty reasonable considering the 8M removes.

Sharding doesn't allow us to rename collections, so db.collection.remove() might be the only way to go.

For a standalone or replica set instance, since 8M removes might introduce a lot of fragmentation, it might be better to copy the documents matching, {'dts':{'$lt':ISODate('2012-06-21T00:00:00Z')}} into a new collection, recreate the indexes on the new collection, drop the old collection, and rename the new collection to the old collection's name.

// copy documents we want to keep
db.collection.find( {'dts':{'$lt':ISODate('2012-06-21T00:00:00Z')}} ).forEach( function( x ) {
    db.new.insert( x );
} );
 
// create indexes on new collection
db.collection.getIndexes().forEach( function( i ) {
    db.new.ensureIndex( i.key );
} );
 
db.new.renameCollection( db.collection.getName(), dropTarget: true );

Comment by Steven Casey [ 21/Sep/12 ]

To try and rule out the hardware as the cause, i ran hdparm (closed down mongo first):

  1. hdparm -t /dev/vda

/dev/vda:
Timing buffered disk reads: 700 MB in 3.00 seconds = 232.97 MB/sec

Comment by Steven Casey [ 21/Sep/12 ]

server1: sp-a1
server2: sp-a2

MMS Group name: Buzz Numbers

Generated at Thu Feb 08 03:13:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.