[CXX-769] GridFS chunk retrieval Created: 09/Dec/15  Updated: 11/Sep/19  Resolved: 22/Dec/15

Status: Closed
Project: C++ Driver
Component/s: API
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Joost Meijles Assignee: Unassigned
Resolution: Done Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

To retrieve a chunk from GridFS using the C++ legacy driver I need at minimum the following code:

GridFS gfs(client, "test", "db"); // 2x createIndex()  
GridFile file = gfs.findFile(filename); // 1x findOne
GridFSChunk chunk = file.getChunk(); // 1x findOne

This has quite some overhead (3 extra queries) if I already have a client connection and know which chunk to retrieve.

Is there a way (except for writing my own custom query) to retrieve one chunk by performing only one Mongo query?

Can you explain me why the createIndex calls are necessary in the GridFS constructor?



 Comments   
Comment by Samantha Ritter (Inactive) [ 22/Dec/15 ]

Thanks Joost, I'm going to close this ticket. Feel free to open a new one if you have any other questions.

Comment by Joost Meijles [ 22/Dec/15 ]

Hi Samantha,

Thanks for the clarification, that completely answers my question.

Comment by Samantha Ritter (Inactive) [ 18/Dec/15 ]

Hi Joost,

Correct, there is still a race. The GridFS protocol is inherently race-y, and will be until mongod supports some sort of atomic document access across collections. Running findFile() directly before you retrieve a chunk just makes that window smaller. If you're planning to access the underlying collections directly then eh, it's up to you!

We only check for those indexes once, when the GridFS object is constructed. The object's construction itself is unrelated to either insertion or retrieval. The most efficient thing to do is create a GridFS object in your application and keep it around to use for all of your grid operations. Then, those two extra network calls only happen once. Because the indexes are on two different collections, there is no way to check for them with fewer than two network roundtrips. So, unfortunately, no plans to optimize that.

Comment by Joost Meijles [ 18/Dec/15 ]

Hi Samantha,

Thanks for your reply.

> The reason the driver's gridfs class forces you to call findFile() before getChunk() is to reduce the chance of a race condition with another client.
Clear, but as this does not fully eliminate the possibility of a race condition I still have to handle the exception that could be raised. So I don't see the advantage of doing a findFile() when I already know the file id.
Is this correct?

> In order to work efficiently, gridfs needs two indexes, one on the "files" collection and one on the "chunks" collection.
I would expect to check this only upon insertion and not also on retrieval.
Are there plans to optimize this?

Comment by Samantha Ritter (Inactive) [ 17/Dec/15 ]

Hi there,

It is possible for you to query the gridfs's underlying collection directly, though we don't recommend doing so. If you already knew the file id, collection prefix (yours is "db") and the chunk number you could simply run a findOne on "db.chunks" for a document with the right file id and chunk number.

The reason the driver's gridfs class forces you to call findFile() before getChunk() is to reduce the chance of a race condition with another client. For example, say you stored "file.txt" in gridfs, and kept track of the chunk number of some chunk to use later. In the meantime, another client calls removeFile() on the same database, which removes all of "file.txt"'s chunks. If you later went to read the chunk without first calling findFile(), you would fail to find the chunk.

In order to work efficiently, gridfs needs two indexes, one on the "files" collection and one on the "chunks" collection. If those indexes already exist, then calling createIndex() will have no effect, it won't do unnecessary work. It will still add a network roundtrip, but a newly-constructed gridfs object has no way of knowing whether those indexes have been created without asking the server.

Generated at Wed Feb 07 22:00:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.