[CXX-769] GridFS chunk retrieval Created: 09/Dec/15 Updated: 11/Sep/19 Resolved: 22/Dec/15 |
|
| Status: | Closed |
| Project: | C++ Driver |
| Component/s: | API |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Joost Meijles | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | legacy-cxx | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
To retrieve a chunk from GridFS using the C++ legacy driver I need at minimum the following code:
This has quite some overhead (3 extra queries) if I already have a client connection and know which chunk to retrieve. Is there a way (except for writing my own custom query) to retrieve one chunk by performing only one Mongo query? Can you explain me why the createIndex calls are necessary in the GridFS constructor? |
| Comments |
| Comment by Samantha Ritter (Inactive) [ 22/Dec/15 ] |
|
Thanks Joost, I'm going to close this ticket. Feel free to open a new one if you have any other questions. |
| Comment by Joost Meijles [ 22/Dec/15 ] |
|
Hi Samantha, Thanks for the clarification, that completely answers my question. |
| Comment by Samantha Ritter (Inactive) [ 18/Dec/15 ] |
|
Hi Joost, Correct, there is still a race. The GridFS protocol is inherently race-y, and will be until mongod supports some sort of atomic document access across collections. Running findFile() directly before you retrieve a chunk just makes that window smaller. If you're planning to access the underlying collections directly then eh, it's up to you! We only check for those indexes once, when the GridFS object is constructed. The object's construction itself is unrelated to either insertion or retrieval. The most efficient thing to do is create a GridFS object in your application and keep it around to use for all of your grid operations. Then, those two extra network calls only happen once. Because the indexes are on two different collections, there is no way to check for them with fewer than two network roundtrips. So, unfortunately, no plans to optimize that. |
| Comment by Joost Meijles [ 18/Dec/15 ] |
|
Hi Samantha, Thanks for your reply. > The reason the driver's gridfs class forces you to call findFile() before getChunk() is to reduce the chance of a race condition with another client. > In order to work efficiently, gridfs needs two indexes, one on the "files" collection and one on the "chunks" collection. |
| Comment by Samantha Ritter (Inactive) [ 17/Dec/15 ] |
|
Hi there, It is possible for you to query the gridfs's underlying collection directly, though we don't recommend doing so. If you already knew the file id, collection prefix (yours is "db") and the chunk number you could simply run a findOne on "db.chunks" for a document with the right file id and chunk number. The reason the driver's gridfs class forces you to call findFile() before getChunk() is to reduce the chance of a race condition with another client. For example, say you stored "file.txt" in gridfs, and kept track of the chunk number of some chunk to use later. In the meantime, another client calls removeFile() on the same database, which removes all of "file.txt"'s chunks. If you later went to read the chunk without first calling findFile(), you would fail to find the chunk. In order to work efficiently, gridfs needs two indexes, one on the "files" collection and one on the "chunks" collection. If those indexes already exist, then calling createIndex() will have no effect, it won't do unnecessary work. It will still add a network roundtrip, but a newly-constructed gridfs object has no way of knowing whether those indexes have been created without asking the server. |