[CSHARP-1682] Race condition could result in GridFS indexes not being created Created: 01/Jun/16 Updated: 23/Sep/16 Resolved: 07/Jun/16 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | GridFS |
| Affects Version/s: | 2.2.4 |
| Fix Version/s: | 2.3 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Alessandro Catale | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
Original Description
We use the C# GridFS Package 2.2.3/2.2.4 against a Replica Set running MongoDB 3.2.6 and WiredTiger. We're facing the following Problem:
After comparing various Installation of our Application on differnet Platforms (DEV, INT, STAGING, PROD), On the event on creating the "files_id_1_n_1 " index on the .chunks collection manually, the platform worked as expected. After analysing the C# Driver code on GitHub, we understood that the indecies are created by the driver (and not the server). Under which conditions, this index is created / or not created?
This issue is really urgent to us because it affects our caching-strategy and the recreation of those indexes on our production environment takes to much time. When we recreate that index the mongoDb takes all tickets available which makes our application unusable ! We can't even connect to the primary though the shell or mongochef. Best regards, Office +41 58 221 48 55 Swisscom IT Services DiagnosisThere is a race condition in the EnsureIndexes method: https://github.com/mongodb/mongo-csharp-driver/blob/0ad339c4c889076680245c3786bddf3ddd2654e3/src/MongoDB.Driver.GridFS/GridFSBucket.cs#L768 If two threads attempt to upload a file at the same time it is possible that the first thread will acquire the __ensuredIndexes lock on line 770 and then see a non-empty collection on line 773 because the second thread might have uploaded a different file in the meantime. |
| Comments |
| Comment by Githook User [ 07/Jun/16 ] | ||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
I think another workaround would be for each thread to create its own instance of GridFSBucket. That would also eliminate the race condition because each bucket would only have one thread using it. Creating an instance of GridFSBucket is fairly cheap. So this workaround would not be expensive. It would result in each thread probing the files collection to see if it is empty, but that's exactly what the other workaround I suggested does. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
Thanks! We'll test it asap and I'll give you feedback asap! | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
I don't have a workaround for re-creating the missing indexes. They have to be created for GridFS to work. Your only options are to create them in the foreground or the background. | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
Here's a workaround that would prevent the issue from occurring again with the current driver. Since the problem is that there is a race condition that could result in the indexes not being created, just call this helper method first before uploading a file:
This will do a quick round trip to the server to probe whether the files collection is empty or not, and if it is it will create the indexes. It doesn't matter if multiple threads happen to see an empty collection at the same time. The server knows how to handle multiple simultaneous create index requests for the same index and will only create the index once. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
Ok, would you like to skype tommorrow? I'll arrange that our developpers would be also available. Would your workaround be creating the index on a secondary? That's not a solution for us and it will still take too much time. We also couldn't resync the node (we took him offline in order to be able to drop these cache-collections) last time. We opened up also a ticket: https://jira.mongodb.org/browse/CS-31114 | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
We don't have a release planned yet for the near future. I can help you come up with a workaround. That would solve your problem in the short term. And be much faster than waiting for the next release. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
Thank you very much! I tested it and got the same results. It's really urgent because a recreate of the chunks-index causes nearly a crash in our production cacheDb. It seems like mongoDB takes all tickets available only for the recreate! We have 3'500'000 items and we're just at the beginning... When can we expect a solution for this issue? | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
I would like to add that there is no safe way to drop the GridFS collections while an application that uses them is running. The issue is that no matter which collection you drop first there could be consistency problems between the two. There is no command to the server to drop two collections atomically. There may be an upload or download in progress when you drop the collections, and that will result in either errors or a corrupted upload. | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
Yes, that sounds plausible. If the GridFS collections are dropped while the application is running the application won't notice and won't re-create the indexes. And restarting the application when the collections are no longer empty also won't re-create the indexes. And it also looks like the race condition in EnsureIndexes could result in the indexes not getting created in the first place. We'll fix that. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
in a very first try, we can confirm this:
so we though about how this could have been happen:
could you think this is the case? | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
If you dropped the collections without restarting the applications the indexes would not have been re-created. Each process only checks once whether the indexes need to be created. It doesn't check repeatedly. And yes, GridFS would work without the indexes but would be doing full collection scans, so as soon as the collections got big enough the GridFS operations would slow significantly. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
We also dropped the collection and the indexes have not been created. We also found out that is runs without indexes but after a certain time it's unusable. | ||||||||||||||
| Comment by Alessandro Catale [ 01/Jun/16 ] | ||||||||||||||
|
Thank you Robert for your super fast answer! We set up a new environment with an empty replicase, the latest stable driver and db and noticed the same problem: the driver has created the collections, but on uploading the first files, the gridfs specific index were not created. We then retested the same procedure with a local installation (no replicaset, no authentisation etc. etc.) - and the index got created. It kind of smeels like your idea of the problem. We use GridFS as a caching storage for pictures & videos. The possibility of hitting the db fast is highly. | ||||||||||||||
| Comment by Robert Stam [ 01/Jun/16 ] | ||||||||||||||
|
The indexes needed by GridFS are created when the very first file is uploaded to GridFS and the collection is empty. The GridFS code itself never deletes these indexes. But if these indexes are ever deleted, they will not magically come back. They will have to be manually recreated (which as you have noted is resource intensive if the GridFS collections are large). Looking at the GridFS code it looks like there might be a race condition if multiple threads simultaneously attempt to upload the very first GridFS file. But if you've already uploaded thousands of GridFS files this did probably not affect you. |