[CSHARP-817] Randomly freezing Created: 10/Sep/13  Updated: 04/Apr/15  Resolved: 04/Apr/15

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Smith Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: freeze
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Server farm mongodb, replicaset + shard version 2.4.6 (64bit 2008R2)
Client app is running on .NET Framweork 4.5


Issue Links:
Related
related to CSHARP-829 MongoCollection.FindAs hangs when Dri... Closed

 Description   

A SystemFileWatcher is running and delegate execution using reflection, the system perform multiple kind of actions,including data import into MongoDB since 2 years. The system suddenly start freezing when handling MongoDB import. This occurs after switching the project to .NET Framework 4.5 .
I never find any way to predict the freezing behavior and restarting the application allow to continue the processing.

Please find hereafter the failing code: no exception triggered, only apps freezing.

public bool Save(MongoCollection col, bool replace)
{
this.CountSet++;
if (Id == ObjectId.Empty)

{ MongoGridFSFileInfo gridFsInfo = col.Database.GridFS.Upload(LocalFilename, this.Name + this.Ext); this.Id = gridFsInfo.Id.AsObjectId; col.Save(this,WriteConcern.Acknowledged); }

else
{
if (replace)
{
using (var fs = new FileStream(LocalFilename,
FileMode.Open))

{ col.Database.GridFS.DeleteById(Id); MongoGridFSCreateOptions opt = new MongoGridFSCreateOptions(); opt.Id = Id; MongoGridFSFileInfo gridFsInfo = col.Database.GridFS.Upload(fs, this.Name + this.Ext, opt); }

}
col.Save(this, WriteConcern.Acknowledged);
}
return true;
}



 Comments   
Comment by John Smith [ 18/Mar/14 ]

Using procexplorer and remote debugger we finally identified the thread eating up the CPU resources.
The incriminated method seems to be MongoDB.Driver.MongoServerInstance.Ping

MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.Ping(MongoDB.Driver.Internal.MongoConnection connection) Line 756
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.StateVerificationTimerCallback() Line 811
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.Connect.AnonymousMethod__2(object o) Line 522
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.TimerQueueTimer.CallCallback()
mscorlib.dll!System.Threading.TimerQueueTimer.Fire()
mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()

Comment by John Smith [ 12/Mar/14 ]

I've now installed remote debugger 2013 in order to investigate further.

I've notice the following issue when apps is freezing and CPU peaking at 100%

I've multiple (1 up to 3) threads with a common stack trace reaching MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.Create

Releasing the apps and debugging few seconds later doesn't change a thing, the thread remains with same stack trace which can be found hereafter:

MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.Create
MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.Create(MongoDB.Bson.IO.BsonChunkPool chunkPool, int fixedCapacity)
MongoDB.Bson.dll!MongoDB.Bson.IO.ByteBufferFactory.LoadFrom(System.IO.Stream stream)
MongoDB.Driver.dll!MongoDB.Driver.Internal.MongoConnection.ReceiveMessage<MongoDB.Driver.CommandResult>(MongoDB.Bson.IO.BsonBinaryReaderSettings readerSettings, MongoDB.Bson.Serialization.IBsonSerializer serializer, MongoDB.Bson.Serialization.IBsonSerializationOptions serializationOptions)
MongoDB.Driver.dll!MongoDB.Driver.Operations.CommandOperation<MongoDB.Driver.CommandResult>.Execute(MongoDB.Driver.Internal.MongoConnection connection)
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.RunCommandAs<MongoDB.Driver.CommandResult>(MongoDB.Driver.Internal.MongoConnection connection, string databaseName, MongoDB.Driver.IMongoCommand command)
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.Ping(MongoDB.Driver.Internal.MongoConnection connection)
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.StateVerificationTimerCallback()
MongoDB.Driver.dll!MongoDB.Driver.MongoServerInstance.Connect.AnonymousMethod__2(object o)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.TimerQueueTimer.CallCallback()
mscorlib.dll!System.Threading.TimerQueueTimer.Fire()
mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()

Comment by John Smith [ 13/Feb/14 ]

Debugging further, using procmon and then dumping the application I realized how the driver works( AcquireConnection), when freezing I've obviously a dead lock with plenty of thread pending with WaitForSingleObject.

I was wondering if we could change the drivers so the AcquireConnectionOptions are made public and can be set from the host application so I can play around and try different settings.
I'm referring to MongoConnectionPool.cs:

_defaultAcquireConnectionOptions = new AcquireConnectionOptions

{ OkToAvoidWaitingByCreatingNewConnection = true, OkToExceedMaxConnectionPoolSize = false, OkToExceedWaitQueueSize = false, WaitQueueTimeout = _settings.WaitQueueTimeout }

;

Comment by John Smith [ 29/Nov/13 ]

I've finally added a logging before any GridFs upload in order to get a bit more information and cross-reference client side log with server log.

I've now added the following codeline:
WriteLog(col.Database.GetCurrentOp().ToString()); //write into a flat file
MongoGridFSFileInfo gridFsInfo = col.Database.GridFS.Upload(LocalFilename, this.Name + this.Ext);

Last time the CPU freezed I get the following log result:

File successfully uploaded:
28/11/13 16:46:37 { "inprog" : [{ "opid" : "rs001:2633", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 2", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(550) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:2631", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(522) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:12208193", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951332716581486601") }

} }, "client_s" : "172.16.12.213:52732", "desc" : "conn193287", "connectionId" : 193287, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(8), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs001:12208192", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951201419431247878") }

} }, "client_s" : "172.16.12.211:61086", "desc" : "conn188986", "connectionId" : 188986, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(10), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs002:11497523", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951332858315407363") }

} }, "client_s" : "172.16.12.213:53145", "desc" : "conn174791", "connectionId" : 174791, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(8), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs002:11497522", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951332858315407363") }

} }, "client_s" : "172.16.12.211:52278", "desc" : "conn174424", "connectionId" : 174424, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(8), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs002:11497525", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951331969257177089") }

} }, "client_s" : "172.16.12.211:59754", "desc" : "conn136", "connectionId" : 136, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(18), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(2), "w" : NumberLong(0) }

} }, { "opid" : "rs002:2924", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(301) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs002:11497524", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951331969257177089") }

} }, "client_s" : "172.16.12.213:64220", "desc" : "conn174", "connectionId" : 174, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(14), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs003:11207241", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5897057270696509441") }

} }, "client_s" : "172.16.12.213:63993", "desc" : "conn128", "connectionId" : 128, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(11), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }, { "opid" : "rs003:11207236", "active" : true, "secs_running" : 2, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5897057270696509441") }

} }, "client_s" : "172.16.12.211:59753", "desc" : "conn134", "connectionId" : 134, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(47), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(5), "w" : NumberLong(0) }

} }] }

File never uploaded ... application had freezed:
28/11/13 16:49:41 { "inprog" : [{ "opid" : "rs001:2633", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 2", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(550) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:2631", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(522) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs001:12209002", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951332716581486601") }

} }, "client_s" : "172.16.12.213:52732", "desc" : "conn193287", "connectionId" : 193287, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(57), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(25), "w" : NumberLong(0) }

} }, { "opid" : "rs001:12209001", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951201419431247878") }

} }, "client_s" : "172.16.12.211:61086", "desc" : "conn188986", "connectionId" : 188986, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(54), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(6), "w" : NumberLong(0) }

} }, { "opid" : "rs002:11501031", "active" : true, "secs_running" : 1, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951336886994731014") }

} }, "client_s" : "172.16.12.213:54016", "desc" : "conn175528", "connectionId" : 175528, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(28), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(4), "w" : NumberLong(0) }

} }, { "opid" : "rs002:11501028", "active" : true, "secs_running" : 2, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5951332858315407363") }

} }, "client_s" : "172.16.12.211:52278", "desc" : "conn174424", "connectionId" : 174424, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(46), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(7), "w" : NumberLong(0) }

} }, { "opid" : "rs002:2924", "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(0), "w" : NumberLong(301) }

, "timeAcquiringMicros" : { } } }, { "opid" : "rs003:11208053", "active" : true, "secs_running" : 3, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5897057270696509441") }

} }, "client_s" : "172.16.12.213:63993", "desc" : "conn128", "connectionId" : 128, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(65), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(9), "w" : NumberLong(0) }

} }, { "opid" : "rs003:11208060", "active" : true, "secs_running" : 0, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { "ts" : { "$gte" :

{ "$timestamp" : NumberLong("5897057270696509441") }

} }, "client_s" : "172.16.12.211:59753", "desc" : "conn134", "connectionId" : 134, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" :

{ "r" : NumberLong(11), "w" : NumberLong(0) }

, "timeAcquiringMicros" :

{ "r" : NumberLong(1), "w" : NumberLong(0) }

} }] }

Comment by John Smith [ 17/Nov/13 ]

C# Driver version upgraded to 1.8.3 and server upgraded to 2.4.7

The problem still occurred randomly, in such case the execution never pass the following code "col.Database.GridFS.Upload(LocalFilename, this.Name + this.Ext);"

Comment by John Smith [ 04/Oct/13 ]

Hello, I've recently been able to catch the following exception (not leading immediately to high CPU usage) which might be associated to my problem. It directly refer to my "Save" method

Message = "Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond."

at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at MongoDB.Bson.IO.ByteBufferFactory.LoadFrom(Stream stream)
at MongoDB.Driver.Internal.MongoConnection.ReceiveMessage[TDocument](BsonBinaryReaderSettings readerSettings, IBsonSerializer serializer, IBsonSerializationOptions serializationOptions)
at MongoDB.Driver.Operations.WriteOperation.SendMessageWithWriteConcern(MongoConnection connection, BsonBuffer buffer, Int32 requestId, BsonBinaryReaderSettings readerSettings, BsonBinaryWriterSettings writerSettings, WriteConcern writeConcern)
at MongoDB.Driver.Operations.InsertOperation.Execute(MongoConnection connection)
at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents, MongoInsertOptions options)
at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, MongoInsertOptions options)
at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, WriteConcern writeConcern)
at MongoDB.Driver.MongoCollection.Insert[TNominalType](TNominalType document, WriteConcern writeConcern)
at MongoDB.Driver.MongoCollection`1.Insert(TDefaultDocument document, WriteConcern writeConcern)
at MongoDB.Driver.GridFS.MongoGridFS.Upload(Stream stream, String remoteFileName, MongoGridFSCreateOptions createOptions)
at MongoDB.Driver.GridFS.MongoGridFS.Upload(Stream stream, String remoteFileName)
at MongoDB.Driver.GridFS.MongoGridFS.Upload(String localFileName, String remoteFileName)
at "xxx.xxxx.xxxx.xxxx.Save(MongoCollection col, Boolean replace)

Comment by John Smith [ 11/Sep/13 ]

Good morning Craig,

We only moved the target Framework of our project from 3.5 to 4.5 without upgrading the driver.
The app support extended logging to a log file (which thread is handling what and when with all reflection steps).
The freezing occurred from an identified method Save from class ArchivedFile and always into this method but randomly.

We have from 1 up to 3 freezing per day, they occured at any time whatever the stream of file is loaded or not.

This method is only called after empty result from query checking md5 value.

The way we use MongoDB might be important and it is why I added the code.
We enforce a same ObjectId value between our main collection and the fs.files collection.

Comment by Craig Wilson [ 10/Sep/13 ]

I assume you are using version 1.8.2 as that is the affects version. What framework were you on before upgrading to .NET 4.5? Did you upgrade your driver version when you upgraded to .NET 4.5 and, if so, what did you upgrade from?

If the app is freezing without an exception or stacktrace, how do you know where the app is freezing? Could you pinpoint the line so we can attempt to reproduce? Is there anyway you could create a small little sample app that has this problem.

Generated at Wed Feb 07 21:37:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.