[CSHARP-326] Creating files with Upload() in Parallel threads causes md5 and chunk errors. Created: 16/Sep/11  Updated: 02/Apr/15  Resolved: 30/Sep/11

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.2
Fix Version/s: 1.3

Type: Bug Priority: Blocker - P1
Reporter: Andrew Finnell Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: commands, concurrency, driver
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS: Windows 7
Framework: .NET 4.0 and Mono 2.10.x


Attachments: Text File GridTest.cs     Text File GridTest.cs     Text File Program.cs    
Issue Links:
Related
is related to CSHARP-330 GridFS object is not inheriting SafeM... Closed

 Description   

This unit tests fails with MongoDB.Driver.MongoCommandException : Command 'filemd5' failed: exception: chunks out of order (response:

{ "errmsg" : "exception: chunks out of order", "code" : 10040, "ok" : 0.0 }

).

The server complains that:

Fri Sep 16 11:50:28 [conn518] should have chunk: 1 have:0
c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101deef8'), files_id: ObjectId(' 4e737044ef94a52e101deee4'), n: 0, data: BinData }

c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101deef1'), files_id: ObjectId(' 4e737044ef94a52e101deee4'), n: 0, data: BinData }

Fri Sep 16 11:50:29 [conn513] should have chunk: 1 have:0
c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101deef4'), files_id: ObjectId(' 4e737044ef94a52e101deee1'), n: 0, data: BinData }

c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101deeed'), files_id: ObjectId(' 4e737044ef94a52e101deee1'), n: 0, data: BinData }

Fri Sep 16 11:50:29 [conn509] should have chunk: 1 have:0
c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101def05'), files_id: ObjectId(' 4e737044ef94a52e101deefb'), n: 0, data: BinData }

c->nextSafe():

{ _id: ObjectId('4e737044ef94a52e101def00'), files_id: ObjectId(' 4e737044ef94a52e101deefb'), n: 0, data: BinData }

NOTE: I I changed MaxConnectionPoolSize in MongoDbServerSettings to 1, meaning only a single connection can be alive at once, this error does not occur.

I'd also like to point out that using ThreadPool in the C# Connection Manager seems like a bad idea seeing as this ThreadPool is static and shared with everything else in the system allowing any code in the system to essentially "lock out" the WaitCallback's.



 Comments   
Comment by Robert Stam [ 30/Sep/11 ]

safe=true only works for GridFS after CSHARP-330 was fixed so that the GridFS settings inherit the SafeMode of the database.

Comment by Robert Stam [ 30/Sep/11 ]

I can pretty consistently reproduce a problem with safe=false, although I see a different error (probably because I'm using localhost so all the network related timings and bottlenecks are different). I also have yet to see a problem with safe=true or with RequestStart.

By the way, using the Upload method is MUCH faster than using the stream based API (by orders of magnitude). You can upload the file like this instead:

var bytes = Enumerable.Range(0, fileSize).Select(b => (byte) b).ToArray();
using (var stream = new MemoryStream(bytes))

{ grid.Upload(stream, filename); }

The reason it is so much faster is that the stream based API has to allocate a 256KB size chunk because it doesn't know in advance how much data you are going to write. When the file is much smaller than this most of the chunk is wasted. The Upload handles the last chunk differently so a small file consisting of a single small chunk is handled much more efficiently.

I'm going to mark this as resolved, because I think the anomalies can all be explained by safe=false and the absence of RequestStart.

Comment by Robert Stam [ 30/Sep/11 ]

Theory: when safe=false the AddMissingChunks method sometimes doesn't see the chunk 0 that has already been submitted (but on another connection and not yet written to the database) and decides to write a full length chunk of zeros. This either results in and md5 error, or just causes the data to be corrupted, depending on timing.

Settings safe=true will prevent this from happening, as well as using RequestStart (which causes all operations for a single thread to happen on one connection so no anomalies occur).

Comment by Robert Stam [ 29/Sep/11 ]

I am unable to reproduce this using the Program.cs file I've attached. It is likely the case that this now works because safe=true is now being correctly inherited by the GridFS settings.

Comment by Andrew Finnell [ 19/Sep/11 ]

My test had a fatal flaw in it. That's what I get for trying to adapt it. With the addition of RequestStart in my real code it seems to be succeeding. I am not sure why though as I really thought it had never worked. I will mark this is trivial and update it with any additional information. Regardless of everything, I have never been able to receive an error in the C# driver. This concerns me the most. The fact I cannot programatically determine a chunking error occurred.

Comment by Andrew Finnell [ 16/Sep/11 ]

Robert,

I apologize for the spam. But I quickly modified the test to do the Touch and Write within the same RequestStart() thus the same thread and it still fails. I just wanted to check.

Andrew

Comment by Andrew Finnell [ 16/Sep/11 ]

Updated GridTest to try with RequestStart(). Operations still fail.

Comment by Andrew Finnell [ 16/Sep/11 ]

Robert,

It still fails but it supresses the exception now. Even if GetLastError is checked.

The Write() on the stream appears to succeed as far s the driver is concerned but the result is a file with 0 size, and chunk errors issued by the database.

I cannot do a Touch() and OpenRead() within the same RequestStart(). I need everything to be written and done in the server after the RequestStart() during the Touch() is finished.

Scenario: GridFS being used as a DFS. A process opens a "handle" to a file with UpdateOrCreate(). The unique id of the new file needs to be created and an empty file needs to exist within GridFS. The user then performs writes using the file handle which uses the stream obtained by the MongoDbFileInfo object.

Andrew

Comment by Robert Stam [ 16/Sep/11 ]

When using the streaming API you should use RequestStart/RequestDone (or RequestStart with a using statement) around the code that writes to the MongoGridFSStream object.

Can you add the call to RequestStart inside your parallel task and test again?

This might just be a documentation issue...?

Generated at Wed Feb 07 21:36:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.