[SERVER-4301] Why does Mongo become slower when RAM is about used up Created: 17/Nov/11 Updated: 15/Aug/12 Resolved: 04/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Client, Performance, Testing Infrastructure |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | wei lu | Assignee: | Tad Marshall |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | Windows, insert | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
4 shards on Windows Server 2003, not replica set but single node is used as shard, C++ driver: aposto/mongodb-cxx-windows-driver (https://github.com/aposto/mongodb-cxx-windows-driver/contributors) compiled on VS2005. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
When I continuously insert/select/update a collection, one thread for each operation, the whole performance (opcount) drops when one of the shards' RAM is about used up. I also attache a test file here, and it is so appreciated that you can give me some help. You can quick jump to the "Observe the cluster" part to get direct information of my questions. |
| Comments |
| Comment by Tad Marshall [ 04/Mar/12 ] |
|
Consolidating closely related tickets into one. |
| Comment by Tad Marshall [ 22/Dec/11 ] |
|
I tested running xperf as described above and at first glance I can't see that it reveals anything. You will get more useful results if you limit the time interval as much as possible and don't run anything else on the system while xperf is collecting data. In 45 minutes of logging, I created a 4.5 GB trace file but because I was using my system while the test was running, it shows all my activity, not just mongod.exe. I'm not sure if xperf will be helpful in diagnosing the memory issues. |
| Comment by wei lu [ 05/Dec/11 ] |
|
I am just back from a short vacation, and sorry for the late responding. |
| Comment by Tad Marshall [ 01/Dec/11 ] |
|
In starting to talk to Microsoft about these issues (memory usage, degraded performance) they had some suggestions for research:
I haven't tried this yet, but I will and will report what I learn. You could try it on your machines and see if it tells you something useful, thanks! |
| Comment by wei lu [ 30/Nov/11 ] |
|
The "used up" means that res memory size is about to reach the physical RAM size, and page swap does indeed occurs. The thing is, then more pages swapped in this scenario, the whole performance dropped, as I mentioned in the document. |
| Comment by Tad Marshall [ 30/Nov/11 ] |
|
Thanks for your note. You may be deeper into the relevant code than I am at this point. But it makes little sense that a call to SetProcessWorkingSetSize() should be able to free memory that we have "locked" somehow, and if we haven't "locked" it (I'm using the word "locked" casually, there may not be any explicit lock) then I don't understand why it isn't just "taken" by Windows when it needs physical RAM. Memory should not be "used up" when it can simply be paged out to a file. You may be right that we need to explicitly control the size of the views that we create. A theory we have had is that the OS knows how to manage memory and we should let it do its job, but we may need to revisit that logic. I'd like to dig a little deeper (maybe catch up with you!) before we start to change code. If you have specific bits of code that you'd like to direct me to, that would be great. I'm halfway into debugging a different problem with memory mapped files on Windows and will be distracted by that for a day or two more, but then I'd like to get to the bottom of the out-of-memory issues that you and others have seen on Windows. Both performance and stability are not what they should be under some workloads, and we can and should fix it. |
| Comment by wei lu [ 30/Nov/11 ] |
|
Thank you for your responding. Actually, I read the source codes of "db_10.sln" recently, especially codes relavent to Memory mapped files. I find that views of a file is mapped into memory address when we insert or select a document. However, the memory is not unmapped until we shutdown the process. So the memory is used up as more and more documents are affected. I think, if memory pages which are not affected for a long time are unmapped, the memory usage may be better managed. But I don't know whether it is possible to do that... |
| Comment by Tad Marshall [ 29/Nov/11 ] |
|
Sorry for the delay in responding. I have an ongoing project to get to the bottom of multiple issues with memory usage on Windows. Unless I can determine something that we are not doing right, we may need to follow up with Microsoft to determine what is wrong. The short story is that our use of memory mapped files should not present Windows with the kind of memory demand that we see. When a normal process uses memory, it just allocates it from a pool using an API that eventually turns into a call to HeapAlloc(), or it reserves private memory for itself using VirtualAlloc() and then "commits" it by writing into it. In both of these cases, to reclaim the physical memory, Windows must page out that data to the page file. This is not the case for MongoDB's memory mapped files. When Windows needs physical RAM for any process, it can page our memory mapped file out to the file itself and free the memory that way. Memory mapped files do not consume page file space: they act as their own page files. This doesn't work the way it should under load. Somehow, Windows ends up consuming more memory, leaving less for other processes, and even leaving less for MongoDB itself. Under extreme load, "free" memory dwindles until everything becomes dog slow and things start to fail. This isn't a very good answer and I apologize for that, but my research so far hasn't found the precise place where a fix can be made, by either us or Microsoft. I'll update the bug with more and better information as soon as I can, thank you for your patience! |
| Comment by wei lu [ 17/Nov/11 ] |
|
cp.zip contains executable file, but I am not sure whether it works on your machine. You can run the python script; |
| Comment by Tad Marshall [ 17/Nov/11 ] |
|
If this is using dummy (non-proprietary) data, would it be possible for you to upload your code so that we could reproduce your results? Performance might be affected by document sizes and index paging, and we could give better advice if we could see the details of what is driving performance in your specific test cases. Thanks! |