[CSHARP-817] Randomly freezing Created: 10/Sep/13 Updated: 04/Apr/15 Resolved: 04/Apr/15 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | John Smith | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | freeze | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Server farm mongodb, replicaset + shard version 2.4.6 (64bit 2008R2) |
||
| Issue Links: |
|
||||||||
| Description |
|
A SystemFileWatcher is running and delegate execution using reflection, the system perform multiple kind of actions,including data import into MongoDB since 2 years. The system suddenly start freezing when handling MongoDB import. This occurs after switching the project to .NET Framework 4.5 . Please find hereafter the failing code: no exception triggered, only apps freezing. public bool Save(MongoCollection col, bool replace) else } |
| Comments |
| Comment by John Smith [ 18/Mar/14 ] |
|
Using procexplorer and remote debugger we finally identified the thread eating up the CPU resources. MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.Ping(MongoDB.Driver.Internal.MongoConnection connection) Line 756 |
| Comment by John Smith [ 12/Mar/14 ] |
|
I've now installed remote debugger 2013 in order to investigate further. I've notice the following issue when apps is freezing and CPU peaking at 100% I've multiple (1 up to 3) threads with a common stack trace reaching MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.Create Releasing the apps and debugging few seconds later doesn't change a thing, the thread remains with same stack trace which can be found hereafter: MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.Create |
| Comment by John Smith [ 13/Feb/14 ] |
|
Debugging further, using procmon and then dumping the application I realized how the driver works( AcquireConnection), when freezing I've obviously a dead lock with plenty of thread pending with WaitForSingleObject. I was wondering if we could change the drivers so the AcquireConnectionOptions are made public and can be set from the host application so I can play around and try different settings. _defaultAcquireConnectionOptions = new AcquireConnectionOptions ; |
| Comment by John Smith [ 29/Nov/13 ] |
|
I've finally added a logging before any GridFs upload in order to get a bit more information and cross-reference client side log with server log. I've now added the following codeline: Last time the CPU freezed I get the following log result: File successfully uploaded: , "timeAcquiringMicros" : { } } }, { "opid" : "rs001:2631", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(522) }, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:12208193", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951332716581486601") }} }, "client_s" : "172.16.12.213:52732", "desc" : "conn193287", "connectionId" : 193287, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(8), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs001:12208192", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951201419431247878") }} }, "client_s" : "172.16.12.211:61086", "desc" : "conn188986", "connectionId" : 188986, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(10), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs002:11497523", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951332858315407363") }} }, "client_s" : "172.16.12.213:53145", "desc" : "conn174791", "connectionId" : 174791, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(8), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs002:11497522", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951332858315407363") }} }, "client_s" : "172.16.12.211:52278", "desc" : "conn174424", "connectionId" : 174424, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(8), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs002:11497525", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951331969257177089") }} }, "client_s" : "172.16.12.211:59754", "desc" : "conn136", "connectionId" : 136, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(18), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(2), "w" : NumberLong(0) }} }, { "opid" : "rs002:2924", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(301) }, "timeAcquiringMicros" : { } } }, { "opid" : "rs002:11497524", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951331969257177089") }} }, "client_s" : "172.16.12.213:64220", "desc" : "conn174", "connectionId" : 174, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(14), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs003:11207241", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5897057270696509441") }} }, "client_s" : "172.16.12.213:63993", "desc" : "conn128", "connectionId" : 128, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(11), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }, { "opid" : "rs003:11207236", "active" : true, "secs_running" : 2, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5897057270696509441") }} }, "client_s" : "172.16.12.211:59753", "desc" : "conn134", "connectionId" : 134, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(47), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(5), "w" : NumberLong(0) }} }] } File never uploaded ... application had freezed: , "timeAcquiringMicros" : { } } }, { "opid" : "rs001:2631", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(522) }, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:12209002", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951332716581486601") }} }, "client_s" : "172.16.12.213:52732", "desc" : "conn193287", "connectionId" : 193287, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(57), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(25), "w" : NumberLong(0) }} }, { "opid" : "rs001:12209001", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951201419431247878") }} }, "client_s" : "172.16.12.211:61086", "desc" : "conn188986", "connectionId" : 188986, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(54), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(6), "w" : NumberLong(0) }} }, { "opid" : "rs002:11501031", "active" : true, "secs_running" : 1, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951336886994731014") }} }, "client_s" : "172.16.12.213:54016", "desc" : "conn175528", "connectionId" : 175528, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(28), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(4), "w" : NumberLong(0) }} }, { "opid" : "rs002:11501028", "active" : true, "secs_running" : 2, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5951332858315407363") }} }, "client_s" : "172.16.12.211:52278", "desc" : "conn174424", "connectionId" : 174424, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(46), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(7), "w" : NumberLong(0) }} }, { "opid" : "rs002:2924", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(301) }, "timeAcquiringMicros" : { } } }, { "opid" : "rs003:11208053", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5897057270696509441") }} }, "client_s" : "172.16.12.213:63993", "desc" : "conn128", "connectionId" : 128, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(65), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(9), "w" : NumberLong(0) }} }, { "opid" : "rs003:11208060", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" : { "$timestamp" : NumberLong("5897057270696509441") }} }, "client_s" : "172.16.12.211:59753", "desc" : "conn134", "connectionId" : 134, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(11), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(1), "w" : NumberLong(0) }} }] } |
| Comment by John Smith [ 17/Nov/13 ] |
|
C# Driver version upgraded to 1.8.3 and server upgraded to 2.4.7 The problem still occurred randomly, in such case the execution never pass the following code "col.Database.GridFS.Upload(LocalFilename, this.Name + this.Ext);" |
| Comment by John Smith [ 04/Oct/13 ] |
|
Hello, I've recently been able to catch the following exception (not leading immediately to high CPU usage) which might be associated to my problem. It directly refer to my "Save" method Message = "Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) |
| Comment by John Smith [ 11/Sep/13 ] |
|
Good morning Craig, We only moved the target Framework of our project from 3.5 to 4.5 without upgrading the driver. We have from 1 up to 3 freezing per day, they occured at any time whatever the stream of file is loaded or not. This method is only called after empty result from query checking md5 value. The way we use MongoDB might be important and it is why I added the code. |
| Comment by Craig Wilson [ 10/Sep/13 ] |
|
I assume you are using version 1.8.2 as that is the affects version. What framework were you on before upgrading to .NET 4.5? Did you upgrade your driver version when you upgraded to .NET 4.5 and, if so, what did you upgrade from? If the app is freezing without an exception or stacktrace, how do you know where the app is freezing? Could you pinpoint the line so we can attempt to reproduce? Is there anyway you could create a small little sample app that has this problem. |