[SERVER-4684] Severe server slowdown Created: 15/Jan/12 Updated: 15/Jan/12 Resolved: 15/Jan/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | pv1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Linux |
| Participants: |
| Description |
|
One of our servers started expecting severe slowdowns (working 10-100x slower) around 17:05 (time for reference only). It continued working, but very slowly, leading to other replica set deciding it's dead around 17:29. Right around the problem start we saw these (there was NO mentioning of any "killcursors" before in the log): which became a lot worse later: Maybe it's our application mismanaging cursors? At first I decided that there was some kind of a hard drive failure, leading to this, but just restarting the server fixed the problem completely, so I don't think it's a hardware problem. My guess is either some internal structures got damaged, or there is some problem with PHP driver or our pattern of it's usage. I checked dmesg, kern.log, etc. – no signs of anything going wrong with hardware and/or kernel stuff. This problem recurs without any visible causes, please advise. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 15/Jan/12 ] |
|
there are other cases related to the assert - see the label, so going to close this |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
Additional details: we had this problem all along, but VM guys allowed burst IOPS to be 8x higher for short periods of time. When load came, the burst gone and we got stuck with consumer-HDD-grade speeds for all our LVMs. |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
Sorry for the false alert, working with our VM provider people proved that there is a hidden binding of IOPS slots to memory slots, so having low memory on machines (we set 2Gb RAM) leaded to IOPS being limited to 300 IOPS per device that thrashed everything. Sorry again for this, it appears you can close the case (although that assert might still need some attention). |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
It appears that there was a problem with limited memory on a VM. Increasing amount of RAM and IOPS slots (we have pay-as-you-use virtualization billing) got rid of the problems temporarily, I will report progress later. |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
php driver is 1.2.6 |
| Comment by Eliot Horowitz (Inactive) [ 15/Jan/12 ] |
|
What version of the php driver? |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
Query execution skyrockets after the problem triggers, check this, the longer, the worse: |
| Comment by Aristarkh Zagorodnikov [ 15/Jan/12 ] |
|
I would like to comment that this runs on Xen virtual environment, using 2.6.32.36 kernel. |