[SERVER-5493] Delete by query is really slow Created: 03/Apr/12 Updated: 11/Jul/16 Resolved: 31/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 2.0.4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Nic Cottrell (Personal) | Assignee: | Gregor Macadam |
| Resolution: | Done | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux RHEL5 |
||
| Attachments: |
|
| Participants: |
| Description |
|
I have a collection with 35m objects with avg size 430 bytes. There are several indices including {f:1}and when I run a count on {"f": 0, wc: {$gt: 4}}) I get about 530k matches - this took a minute or so. Next I tried a delete using the same query, which is live but not very loaded right now. After over 8 hours it is still running. I see a lot of wait time in top - so I guess it's i/o bound. The machine has 70G of ram and the filesystem is ext4 with RAID5. |
| Comments |
| Comment by Gregor Macadam [ 31/Aug/12 ] |
|
Hi Stefen - closing this issue. |
| Comment by Scott Hernandez (Inactive) [ 06/May/12 ] |
|
Stefen, please create a new issue with your information, and include a link to this one. Include all the information about your system and collections as well as the old version you were using. |
| Comment by Stefan Kutko [ 06/May/12 ] |
|
We're experiencing the same issue. Deleting by query using 2.0.3 takes SIGNIFICANTLY longer than in previous versions. Not sure what changed, but +1 for a resolution to this issue. |
| Comment by Eliot Horowitz (Inactive) [ 22/Apr/12 ] |
|
With that many indexes capped won't really help too much. |
| Comment by Nic Cottrell (Personal) [ 19/Apr/12 ] |
|
I also have a 12-hourly batch delete of old logs. This collection has about 1.5M objects, average size 1500. It has the following indexes: {{ , , , , , , , , , , , , , , My delete query is simply {start: {$lt: SOMEDATE}}. It deletes about 80k objects each run, taking 20 minutes (1200s) - that's about 60 objects a second I think. Is it all the indexes making this slow? I guess I should really just be using capped collections for something like this, or? |
| Comment by Nic Cottrell (Personal) [ 17/Apr/12 ] |
|
I'm running my own setup - a Dell Poweredge with 40GB ram, and a hardware RAID-5 with 64K stripes. I now have Munin running and the disk data is interesting/strange: I've been running many batch processes this week, mostly setting a field to a new value, and in same cases insert new fields into existing objects. |
| Comment by Scott Hernandez (Inactive) [ 07/Apr/12 ] |
|
It is collecting data again. It just took a few minutes for the data to show up. Looking at your numbers it looks liked your faults and disk io time it is not minor, and running hot. What types of disks/storage are you using? Can you let us know the next time you delete a lot of documents and you see this slowdown again. Then we can look at the MMS info for that time period to diagnose the possible underlying issue. |
| Comment by Nic Cottrell (Personal) [ 05/Apr/12 ] |
|
Strange - definitely have the agent running, but the agent.log shows just: 2012-04-05 20:17:09,655 INFO Starting agent process I tried telnetting to mms.10gen.com on 443 and get through the firewall ok, so not sure why else it's not pinging. According to settings.py, I have 1.3.7 - is that too old? |
| Comment by Nic Cottrell (Personal) [ 05/Apr/12 ] |
|
Ok - I think I have the agent up and running again, but I'm having trouble logging into mms.10gen.com. It's telling me I need to choose a group, or ask admin to do it. But I should be the only associated account. |
| Comment by Scott Hernandez (Inactive) [ 05/Apr/12 ] |
|
Yes, it is capitalized which is not an issue but there has not been an agent reporting in since 3-28. Do you believe your MMS setup is working correctly? It would be very helpful to start collecting stats again in MMS by having you fix your deployment and start an agent, or if one is running, fix its ability to communicate to the internet. |
| Comment by Nic Cottrell (Personal) [ 05/Apr/12 ] |
|
My MMS alerts say "Group: Transmachina" so I guess it's capitalized? |
| Comment by Scott Hernandez (Inactive) [ 04/Apr/12 ] |
|
It seems like no data since 3-28 in MMS. Is this the correct group in mms? https://mms.10gen.com/host/list/4eb94a93ae6429bfa4101305 |
| Comment by Nic Cottrell (Personal) [ 04/Apr/12 ] |
|
I believe this instance is registered in MMS in the group "transmachina" |
| Comment by Nic Cottrell (Personal) [ 04/Apr/12 ] |
|
Sure - after 16 hours it's still running and now other queries are piling up. Attached is mongostat, top, and clients/queries. |
| Comment by Eliot Horowitz (Inactive) [ 04/Apr/12 ] |
|
Can you send mongostat and hardware stats while the delete is happening? |