[CDRIVER-4296] mongoc_gridfs_file_set_id() does not work when the file has many chunks. Created: 16/Feb/22 Updated: 27/Oct/23 Resolved: 05/Apr/22 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | GridFS |
| Affects Version/s: | 1.21.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | kevin wanglong_ | Assignee: | Jesse Williamson (Inactive) |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | needs-first-responder | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
| Comments |
| Comment by PM Bot [ 05/Apr/22 ] |
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information. |
| Comment by Jesse Williamson (Inactive) [ 21/Mar/22 ] |
|
To highlight what I think the most straightforward workaround is: just use auto-id generation: rather than setting the id manually as in the example program, just use the one assigned by GridFS. See-also discussion here: |
| Comment by Jesse Williamson (Inactive) [ 19/Mar/22 ] |
|
This is caused by behavior in the deprecated mongoc_gridfs_t API, which does not conform to the current GridFS API specification. It may be sufficient to update the C Driver's GridFS example program and add a note to the documentation. For cause, reproduction, and discussion, see above. |
| Comment by Jesse Williamson (Inactive) [ 19/Mar/22 ] |
|
Thank you again for reporting this issue, and for your patience while it was investigated. Unfortunately, I see no way to check the underlying status of the is_dirty flag through the mongoc_gridfs_t API, and without that checking to see whether the stream has been written appears to only be indirectly possible. Another Instead, you're encouraged to use the newer mongoc_gridfs_bucket_t GridFS API, and upgrade from mongoc_gridfs_t if possible. The mongoc_gridfs_t implementation used by the example program does not does not comply to the GridFS specification and has been deprecated. You can read further information about the C Driver's support for GridFS and possible migration strategies from You can learn more about the GridFS API specification here: The deprecated mongoc_gridfs_t API, for reference: I've included a discussion and details on why you are seeing this behavior below. I hope that is helpful! Thank you again for your effort in bringing this issue to our attention! -Jesse *Discussion: Under the right circumstances (such as a being asked to seek to 0 in a large (2GB, standard chunk size) and unsaved "new" stream created by mongoc_gridfs_create_file_from_stream() can wind up having it's "is_dirty" flag being un-set. This means that operations like changing its id aren't allowed on it before it has been directly saved by the user, because the id has already been auto-generated and written on account of a hidden mongoc_gridfs_file_seek() call. This is inconsistent with the behavior of the same calls on smaller, non-chunked files, which will still have an is_dirty value of 0 after mongoc_gridfs_create_file_from_stream() and/or mongoc_gridfs_file_seek() call. Neither mongoc_gridfs_create_file_from_stream()'s nor mongoc_gridfs_file_seek()'s documentation does not indicate this side effect, and the call in both cases does not produce an error. monoc_gridfs_create_file_from_stream()'s documentation says it returns a "newly allocated" file, and there is a note that it will read the stream until EOF; mongoc_gridfs_file_seek() does not mention affecting the new-ness state of the file (making this behavior a bit surprising).
In our example program, example-gridfs, this can be observed with the method suggested by the submitter: fallocate -l 1KB foo-1kb ./example-gridfs write foo ./foo-1kb ...and then: ./example-gridfs write bar ./bar-2gb Notice that the first file uses the value set by the example program, but the large file uses an auto-generated id: [ , { _id: ObjectId("62351e38952337de1d0f8be1"), chunkSize: 261120, filename: 'bar', length: Long("2000000000"), uploadDate: ISODate("2022-03-19T00:05:12.207Z") }]
This is because mongoc_gridfs_create_file_from_stream() in example-gridfs.c:116 has called mongoc_gridfs_file_seek() in Our example program assumes that the file is still new (and, indirectly, therefore has "is_dirty" still set), and in (example-gridfs.c:130) when mongoc_gridfs_file_set_id() is called the function fails because the stream has actually already been written to a file as per above. Note the "mongofiles" Go tool generates ids on its own in every case rather than allowing autogeneration, so it doesn't see this issue.
There are two approaches we might consider. The first is to see if it's possible to avoid the write in both functions, or at least in mongoc_gridfs_create_file_from_stream(). This effort might be disproportionate unless this is frequently encountered. One workaround is to do what the "mongofiles" tool does and always generate UUIDs on the client side manually; another is to update the written file chunks when a change of id is needed, which is a potentially expensive operation. In any case, it is probably worthwhile to be sure this behavior is documented-- a comment in the example and update to the deprecated API documentation would be helpful. |
| Comment by Jesse Williamson (Inactive) [ 17/Feb/22 ] |
|
Hello, thank you for reporting this issue! We will make time to investigate and compare it with |
| Comment by kevin wanglong_ [ 16/Feb/22 ] |
|
I had the same problem as https://jira.mongodb.org/browse/CDRIVER-1976. It is normal when the file has only one chunk. |