[SERVER-1780] "doing delete inline" blocks the whole cluster Created: 12/Sep/10 Updated: 16/Nov/21 Resolved: 30/Sep/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.6.2, 1.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergei Tulentsev | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Linux |
| Participants: |
| Description |
|
in best of the best cases it takes at least several seconds. When the amount of data is substantial, it can block for more than an hour, pretty much rendering the whole cluster useless and making queries pile up in the queue. It does some very intensive I/O. Does it do data compaction of some sort? Since we have chunks of data roughly of the same size, could we just mark it free and than later rewrite it? |
| Comments |
| Comment by Guanhai Wang [ 09/Jul/12 ] |
|
Hi, Scott, I am very sorry that I didn't give you some useful details. While the sharding cluster is a production one and has high-traffic. I didn't catch those stats and logs when the problem occurred this afternoon and am not going to reproduce this problem. I had switched the activeWindow to low-traffic times. If it appears again, I will let you know. Thank your very much! |
| Comment by Scott Hernandez (Inactive) [ 09/Jul/12 ] |
|
Guanhai, please open a new issue with stats, and logs. Please include iostat -xmt 2, mongostat and vmstat numbers during the period for all members involved as well as the logs from those members and the mongos instances. In addition please include a mongodump of the config database after the event. Timing/a-timeline would be very useful so please call out when what happened as you experienced it from the user's/applications' perspective. |
| Comment by Guanhai Wang [ 09/Jul/12 ] |
|
When the balancer migrating a chunk from one shard to another I got the same issue with mongodb 1.8.5, git version: 403c8dadcd56f68dcbe06013ecbfac67b32a22ac. When "doing delete inline", all operations of the sharding cluster were blocked more than twenty minutes. My cluster received about 10,000 commands per second at that time. |
| Comment by Eliot Horowitz (Inactive) [ 03/Nov/10 ] |
|
That's expected. The goal is that it shouldn't cause detrimental performance to the system. |
| Comment by Chris Chandler [ 03/Nov/10 ] |
|
The total cluster blocking appears to be resolved in 1.6.4, everything just slows down in relation. The source shard however that's having a chunk moved away is still experiencing ~15 minutes of 100% IO utilization for a 200MB chunk. Is this expected or should I file/comment on a bug elsewhere? |
| Comment by Eliot Horowitz (Inactive) [ 03/Nov/10 ] |
|
Can you try 1.6.4? |
| Comment by Chris Chandler [ 03/Nov/10 ] |
|
I'm still seeing this issue on 1.6.3. Wed Nov 3 12:53:34 MongoDB starting : pid=16617 port=27018 dbpath=/db/var/mongodb/ 64-bit I see the "doing delete inline" message and then iostat -x 2 jumps to 100% for approximately 10-11 minutes. Any attempt to write to the cluster in this window appears to block activity. avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util |
| Comment by Eliot Horowitz (Inactive) [ 30/Sep/10 ] |
|
Ok - going to close for now. |
| Comment by Sergei Tulentsev [ 30/Sep/10 ] |
|
It's performing very well, though write load is significantly lower. |
| Comment by Eliot Horowitz (Inactive) [ 30/Sep/10 ] |
|
Is this performing better or are you still having problems? |
| Comment by Sergei Tulentsev [ 30/Sep/10 ] |
|
I am running 1.6.3 now. |
| Comment by Eliot Horowitz (Inactive) [ 30/Sep/10 ] |
|
1.6.3 has a number of the changes. |
| Comment by Sergei Tulentsev [ 29/Sep/10 ] |
|
Sorry, must have missed your previous comment. What's a stall? I must say that I dont' encounter this behaviour anymore. Probably because I am not inserting data at that rate.Though, I still have some data to import. But I would rather wait for next stable build. Are you going to merge these changes into it? |
| Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ] |
|
Any updates? |
| Comment by Eliot Horowitz (Inactive) [ 12/Sep/10 ] |
|
Yes that. |
| Comment by Sergei Tulentsev [ 12/Sep/10 ] |
|
You mean this? sergio@cs2592:~$ mongod --version |
| Comment by Eliot Horowitz (Inactive) [ 12/Sep/10 ] |
|
Can you send the startup banned with git hash? |
| Comment by Sergei Tulentsev [ 12/Sep/10 ] |
|
Yes, this is still happening in the latest nightly (2010-09-10) |
| Comment by Eliot Horowitz (Inactive) [ 12/Sep/10 ] |
|
Is this still happening in the 1.7? |