[CSHARP-904] C# driver memory leak Created: 02/Feb/14 Updated: 05/Apr/16 Resolved: 20/Jun/14 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | 1.8.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Vincent | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | c#, driver, leak, memory, memory-leak | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows Server 2012 R2 x64 / 8.1 x64 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Memory leak in the MongoDB .NET driver. I was able to reproduce the issue using the attached test solution (except it occurs much slower than in the real app, because objects are much smaller). I'm not sure if the unreliable MongoDB servers are the cause of the memory leak or not. I can't post the Ants Profiler results, because it could contain sensitive information (connection strings, etc.) |
| Comments |
| Comment by Vincent [ 20/Jun/14 ] | |||||||||
|
This problem just reappeared for me since I upgraded to 2.6.2 (and now 2.6.3, same issue). Everything is fine for some days, then I have a very sudden burst in memory consumption to the point I get a OutOfMemoryException (on a 32GB of RAM memory server!). | |||||||||
| Comment by Craig Wilson [ 31/Mar/14 ] | |||||||||
|
Hi Vincent/Peter, After much trying and 3 different memory profilers, I simply can't find any leaks. I do see a steady rise in the quantity of memory required, but I believe this is related to the test program. It uses more threads than I have CPU's, and then attempts to parallelize upserts of 1000 documents. Ultimately, this will lead to a lot of documents being in memory at any given time. In addition, in order to serialize classes, we end up with 2x the size of a document in memory (one for the object and one for the serialized form). I also increased the size of the documents to about 10MB, which makes this much more prounounced. At some point in the distant future, things basically leveled off for me. I also added finalizers to all the classes that have lots of allocations and were taking a lot of memory, and none of those finalizers ever get called (because the Dispose methods were functioning correctly). I'm out of ideas at this point, so I'd love to hear your thoughts. I'm not saying there isn't a problem, I'm just not finding it. | |||||||||
| Comment by Craig Wilson [ 13/Mar/14 ] | |||||||||
|
I don't think it was ever assigned to me, I just happened to be triaging it with your help. Update: not much in the way of good news. This kind of leak is extremely difficult to find and I haven't yet located it. Still looking. I'm trying different things. For instance, with your sample program, I've changed the size of the documents that get inserted/updated to 1MB. It should make this more pronouncable. I'm running it now... | |||||||||
| Comment by Peter Aberline [ 13/Mar/14 ] | |||||||||
|
Hi Any updates on this Craig? I see this issue is no longer assigned to you and is currently unassigned. Thanks, | |||||||||
| Comment by Vincent [ 11/Mar/14 ] | |||||||||
|
No, it catches the exceptions thrown on the upsert. In fact I noticed that the leak was happening all the time when my servers were overloaded (and a lot of exceptions were thrown), but I'm not sure it happens again on my system since I upgraded to much more powerful servers. I'm NOT SURE – which means it can still happens, but I didn't check. That's why I'm asking, maybe it's a good track to follow.
| |||||||||
| Comment by Peter Aberline [ 11/Mar/14 ] | |||||||||
|
Hi Vincent, All I did was run the code you supplied. When it ran out of resources it did throw exceptions yes. But during execution it didn't appear to throw exceptions as your code stops the work thread in that case. To be sure I've added some explicit logging and I'm running it again. Thanks | |||||||||
| Comment by Vincent [ 11/Mar/14 ] | |||||||||
|
Hi Peter, | |||||||||
| Comment by Peter Aberline [ 11/Mar/14 ] | |||||||||
|
Hi Thanks, | |||||||||
| Comment by Craig Wilson [ 11/Mar/14 ] | |||||||||
|
Hi Peter, Thanks for this run. What your profile results seem to show is that our BsonChunkPool is holding on to memory. This is what it is supposed to be doing as it is a pool of BsonChunks. I'm going to start investigating if we are leaking chunks that aren't getting returned. If that were to happen, I'd expect it to manifest itself a little differently, but you never know. Craig | |||||||||
| Comment by Peter Aberline [ 09/Mar/14 ] | |||||||||
|
Hi,
I've attached the Ants Profile results as "memory_leak_test_results.zip". | |||||||||
| Comment by Craig Wilson [ 07/Mar/14 ] | |||||||||
|
I've been running it since yesterday and not seeing much of anything. It slowly increases for an initial period and then levels, and stays consistent. I'm watching the Private Bytes and # Bytes in all Heaps performance counters and they aren't really showing much. I'll look forward to your report... Thanks for helping with this. | |||||||||
| Comment by Peter Aberline [ 07/Mar/14 ] | |||||||||
|
I've been running this overnight on a test vm using Ant Profiler and I'm getting some interesting results. For the first 12 hours or so the memory usage was stable, but after that it seems to have sprung a small leak and now the memory usage of the test program is slowly increasing. I've posted a screenshot as: "ants_results_in_progress.png" I'll leave it running over the weekend and post the Ants results when it finally runs out of resources. Thanks | |||||||||
| Comment by Craig Wilson [ 06/Mar/14 ] | |||||||||
|
Ok. I'll let it run overnight and see what happens. | |||||||||
| Comment by Peter Aberline [ 06/Mar/14 ] | |||||||||
|
I reproduced it with a single node running on my dev box. No shards, no replica sets. | |||||||||
| Comment by Craig Wilson [ 06/Mar/14 ] | |||||||||
|
I didn't let it run overnight, so I probably just didn't let it run long enough... Can you tell me about your shards? Is one up, one down, are they all up? are they all down? It breaks early if there are none up, and doesn't break if they are all available. I also was using a server on my local box, so what exactly is the setup you're using when running this? | |||||||||
| Comment by Peter Aberline [ 06/Mar/14 ] | |||||||||
|
And this was the output of the test program: 12428 - Started Unhandled Exception: Too many threads are already waiting for a connection. Unhandled Exception: Too many threads are already waiting for a connection. at System.Collections.Concurrent.Partitioner.DynamicPartitionEnumerator_Abstract`2.MoveNext() System.AggregateException: One or more errors occurred. ---> MongoDB.Driver.MongoConnectionException: Too many threads are already waiting for a conne at System.Collections.Concurrent.Partitioner.DynamicPartitionEnumerator_Abstract`2.MoveNext() Too many threads are already waiting for a connection. | |||||||||
| Comment by Peter Aberline [ 06/Mar/14 ] | |||||||||
|
Hi Craig, | |||||||||
| Comment by Craig Wilson [ 06/Mar/14 ] | |||||||||
|
Hi Peter. I'm looking into it. Nothing jumped out as to the cause and the problems don't seem to occur when the servers are stable and only occasionally when they aren't (and even then, it seems to self-correct), so I'm really just trying to reproduce still. | |||||||||
| Comment by Peter Aberline [ 06/Mar/14 ] | |||||||||
|
Any updates on this? I've looked through github commits and not seen any check-ins for this. We have seen "Timeout waiting for a connection" and "Too many threads are already waiting for a connection" exceptions during our testing. I increased the connectionTimeout, maxPoolSize, waitQueueSize and waitQueueTimeout parameters in the client connection string to address this but it would be good to be able to eliminate this leak as the cause. Thanks | |||||||||
| Comment by Craig Wilson [ 03/Feb/14 ] | |||||||||
|
Thanks for the report Vincent. We'll begin looking into this and try to repro. Craig | |||||||||
| Comment by Vincent [ 03/Feb/14 ] | |||||||||
|
Seems really related to exceptions thrown by the drivers, but I don't know which ones exactly. I think some stuff aren't correctly disposed when an exception is thrown.
When the servers are OK, I don't have the memory leaks. |