[CSHARP-306] incorrect index on fs.chunks Created: 19/Aug/11 Updated: 11/Mar/19 Resolved: 29/Aug/11 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.2 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Daniel Knippers | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 0 |
| Labels: | driver | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows, C#, Visual Studio 2010, multi-threaded, latest Git source |
||
| Description |
|
In MongoGridFS the index on fs.chunks { files_is : 1, n : 1 }named files_id_1_n_1 is not created with the unique flag. As a consequence when running in a multi-threaded environment (using multiple connections) every so many file uploads you get a "Command 'filemd5' failed: exception: chunks out of order" error. Changing: to: fixes the issue. Or am I missing out on something here and is indeed the git code correct? Also note that the check in the else "files_id_1_n_1" index is missing the 1 at the end of the name. |
| Comments |
| Comment by Daniel Knippers [ 30/Aug/11 ] | ||||
|
Both the RequestStart and the SafeMode.True option (tested separately of course) resolve the "chunks out of order" error when the index is not set to unique. Note that I inserted about 36 million zipped xml documents already in a test database without the "chunks out of order" error by just creating the files_id/n index with unique option. That makes it even more that I wish I could explain the issue | ||||
| Comment by Robert Stam [ 30/Aug/11 ] | ||||
|
I agree. I also wish I could explain why changing the index to unique seemed to fix the "chunks out of order" error. I suspect it didn't really fix it, just changed the probability of seeing it to low enough that you weren't seeing it any more. The RequestStart fix (or SafeMode.True) fixes do make sense though. If you have time to test that change to your code without changing the index to unique that would help verify the theory that the chunks haven't all arrived at the server in time for the md5 command. | ||||
| Comment by Daniel Knippers [ 30/Aug/11 ] | ||||
|
Thanks for the feedback and support. Adding the database.RequestStart() makes sense. Still I am left with a slightly strange feeling as to why the index fix solved the issue (I like to be able to explain issues for 100%, especially when it comes to databases). | ||||
| Comment by Robert Stam [ 29/Aug/11 ] | ||||
|
Good info. So looking at the code I see two things: 1. You are using the stream-based API (the stream is returned by fileInfo.Open) I think you need to surround your code in the StoreFile method with:
In the absence of RequestStart writes to the gridFile stream can be distributed over many connections. And since SafeMode is off the writes can easily back up and be queued in the connections' output buffers. My theory in this scenario is that after the file has been uploaded and the md5 command is sent to the server that not all the chunks have made it up to the server yet. Turning SafeMode on would also fix the "chunks out of order" error because it has the side effect of waiting after every write to make sure the write succeeded. With SafeMode on you wouldn't need the RequestStart, but it wouldn't hurt either. Still not sure how the unique index helps suppress the "chunks out of order" error, but it must have some indirect side effect as you have suggested. | ||||
| Comment by Daniel Knippers [ 29/Aug/11 ] | ||||
|
Slightly simplified method that is used to store a file: private MongoDBRef StoreFile(string FileName) MongoGridFSFileInfo fileInfo = database.GridFS.FindOne(FileName); if (fileInfo != null) MongoGridFSCreateOptions options = new MongoGridFSCreateOptions(); options.ContentType = "application/zip"; fileInfo = new MongoGridFSFileInfo(database.GridFS, FileName, options); using (MongoGridFSStream gridFile = fileInfo.Open(FileMode.Create, FileAccess.ReadWrite)) } return new MongoDBRef(database.Name, database.GridFS.Files.Name, fileInfo.Id); | ||||
| Comment by Robert Stam [ 29/Aug/11 ] | ||||
|
Can you share some details about how you upload your GridFS files? Are you using the Upload method of MongoGridFS, or are you using one of the stream based APIs? Are you using SafeMode? That might help me figure out how there could be chunks missing. | ||||
| Comment by Daniel Knippers [ 29/Aug/11 ] | ||||
|
Agreed that files_id values are different. The only think I can think of is that enforcing the unique index is using some sort of locking that kind of acts as a synchronized/using lock mechanism for the file insert + md5 hash command. | ||||
| Comment by Robert Stam [ 29/Aug/11 ] | ||||
|
If two threads were uploading two GridFS files simultaneously, wouldn't the files_id values be different so there wouldn't even be a chance of collisions in the first place? | ||||
| Comment by Daniel Knippers [ 29/Aug/11 ] | ||||
|
Thanks for fixing. Don't have a validated explanation but what I can think of is that with the unique index two connections wanting to insert a grid fs file (and thus chunks) at exactly the same time get the benefit of the db engine using some sort of locking to make sure the index uniqueness is enforced. | ||||
| Comment by Robert Stam [ 29/Aug/11 ] | ||||
|
I have incorporated your suggested changes. You are correct that the driver was not following the GridFS spec exactly. However, I cannot explain why changing this index to unique would prevent the "chunks out of order" error message. This error message can be the result of either of two things: a missing chunk or a duplicate chunk. The unique index would prevent duplicate chunks, but not missing chunks. I suspect users seeing this error message are actually missing chunks, in which case the unique index would make no difference. If you have an explanation for how changing this index to unique would resolve the "chunks out of order" I would be very interested. |