<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Wed Feb 07 21:20:31 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[CDRIVER-4296] mongoc_gridfs_file_set_id() does not work when the file has many chunks.</title>
                <link>https://jira.mongodb.org/browse/CDRIVER-4296</link>
                <project id="10030" key="CDRIVER">C Driver</project>
                    <description>&lt;h4&gt;&lt;a name=&quot;Summary&quot;&gt;&lt;/a&gt;Summary&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;mongoc_gridfs_file_set_id() does not work when the file has many chunks.&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;&lt;a name=&quot;Environment&quot;&gt;&lt;/a&gt;Environment&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;version: mongo-c-driver-1.21.0&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;host: debian 11 64-bit x86&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;gcc: gcc (Debian 10.2.1-6) 10.2.1 20210110&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;h4&gt;&lt;a name=&quot;HowtoReproduce&quot;&gt;&lt;/a&gt;How to Reproduce&lt;/h4&gt;

&lt;p&gt;I use the example-gridfs tool to upload a big file, but the fs.files._id is not&#160; as expected.&lt;/p&gt;

&lt;p&gt;Then I upload a samll file, fs.files._id is as expected.&lt;/p&gt;

&lt;p&gt;test&amp;gt; db.fs.files.find({})&lt;br/&gt;
[&lt;br/&gt;
 &lt;/p&gt;
{
 _id: ObjectId(&quot;620cd61f9f63db1b8d012941&quot;),
 chunkSize: 261120,
 filename: &apos;ss&apos;,
 length: Long(&quot;429273416&quot;),
 uploadDate: ISODate(&quot;2022-02-16T10:46:55.642Z&quot;)
 }
&lt;p&gt;,&lt;/p&gt;
 {
 _id: 1,
 chunkSize: 261120,
 filename: &apos;aa&apos;,
 length: Long(&quot;170027&quot;),
 uploadDate: ISODate(&quot;2022-02-16T10:58:44.562Z&quot;)
 }
&lt;p&gt;]&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1984525">CDRIVER-4296</key>
            <summary>mongoc_gridfs_file_set_id() does not work when the file has many chunks.</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.mongodb.org/images/icons/priorities/minor.svg">Minor - P4</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="jesse.williamson@mongodb.com">Jesse Williamson</assignee>
                                    <reporter username="wanglong.kevin@gmail.com">kevin wanglong_</reporter>
                        <labels>
                            <label>needs-first-responder</label>
                    </labels>
                <created>Wed, 16 Feb 2022 11:42:44 +0000</created>
                <updated>Fri, 27 Oct 2023 19:50:25 +0000</updated>
                            <resolved>Tue, 5 Apr 2022 12:00:33 +0000</resolved>
                                    <version>1.21.0</version>
                                                    <component>GridFS</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="4457379" author="dbeng-pm-bot" created="Tue, 5 Apr 2022 12:00:38 +0000"  >&lt;p&gt;There hasn&apos;t been any recent activity on this ticket, so we&apos;re resolving it. Thanks for reaching out! Please feel free to comment on this if you&apos;re able to provide more information.&lt;/p&gt;</comment>
                            <comment id="4424983" author="JIRAUSER1261121" created="Mon, 21 Mar 2022 19:36:34 +0000"  >&lt;p&gt;To highlight what I think the most straightforward workaround is: just use auto-id generation: rather than setting the id manually as in the example program, just use the one assigned by GridFS.&lt;/p&gt;

&lt;p&gt;See-also discussion here:&lt;br/&gt;
&lt;a href=&quot;https://jira.mongodb.org/browse/JAVA-1983&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/JAVA-1983&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4422078" author="JIRAUSER1261121" created="Sat, 19 Mar 2022 00:59:07 +0000"  >&lt;p&gt;This is caused by behavior in the deprecated mongoc_gridfs_t API, which does not conform to the current GridFS API specification. It may be sufficient to update the C Driver&apos;s GridFS example program and add a note to the documentation.&lt;/p&gt;

&lt;p&gt;For cause, reproduction, and discussion, see above.&lt;/p&gt;</comment>
                            <comment id="4422076" author="JIRAUSER1261121" created="Sat, 19 Mar 2022 00:54:50 +0000"  >&lt;p&gt;Thank you again for reporting this issue, and for your patience while it was investigated.&lt;/p&gt;

&lt;p&gt;Unfortunately, I see no way to check the underlying status of the is_dirty flag through the mongoc_gridfs_t API, and without that checking to see whether the stream has been written appears to only be indirectly possible. Another&lt;br/&gt;
idea is to always generate a UUID on the client side and avoid the stream call (see discussion below), but other than&lt;br/&gt;
working around the issue I don&apos;t see a direct way of resolving this via the mongoc_gridfs_api.&lt;/p&gt;

&lt;p&gt;Instead, you&apos;re encouraged to use the newer mongoc_gridfs_bucket_t GridFS API, and upgrade from mongoc_gridfs_t if possible. The mongoc_gridfs_t implementation used by the example program does not does not comply to the GridFS specification and has been deprecated.&lt;/p&gt;

&lt;p&gt;You can read further information about the C Driver&apos;s support for GridFS and possible migration strategies from&lt;br/&gt;
the old and not recommended mongoc_gridfs_t implementation (used by the example program) and the newer mongoc_gridfs_bucket_t implementation here:&lt;br/&gt;
&lt;a href=&quot;http://mongoc.org/libmongoc/current/gridfs.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://mongoc.org/libmongoc/current/gridfs.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can learn more about the GridFS API specification here:&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/specifications/blob/master/source/gridfs/gridfs-spec.rst#api&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/specifications/blob/master/source/gridfs/gridfs-spec.rst#api&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The deprecated mongoc_gridfs_t API, for reference:&lt;br/&gt;
&lt;a href=&quot;http://mongoc.org/libmongoc/current/mongoc_gridfs_t.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://mongoc.org/libmongoc/current/mongoc_gridfs_t.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ve included a discussion and details on why you are seeing this behavior below.&lt;/p&gt;

&lt;p&gt;I hope that is helpful! Thank you again for your effort in bringing this issue to our attention!&lt;/p&gt;

&lt;p&gt;-Jesse&lt;/p&gt;

&lt;p&gt;*Discussion:&lt;/p&gt;

&lt;p&gt;Under the right circumstances (such as a being asked to seek to 0 in a large (2GB, standard chunk size) and unsaved &quot;new&quot; stream created by mongoc_gridfs_create_file_from_stream() can wind up having it&apos;s &quot;is_dirty&quot; flag being un-set. This means that operations like changing its id aren&apos;t allowed on it before it has been directly saved by the user, because the id has already been auto-generated and written on account of a hidden mongoc_gridfs_file_seek() call.&lt;/p&gt;

&lt;p&gt;This is inconsistent with the behavior of the same calls on smaller, non-chunked files, which will still have an is_dirty value of 0 after mongoc_gridfs_create_file_from_stream() and/or mongoc_gridfs_file_seek() call.&lt;/p&gt;

&lt;p&gt;Neither mongoc_gridfs_create_file_from_stream()&apos;s nor mongoc_gridfs_file_seek()&apos;s documentation does not indicate this side effect, and the call in both cases does not produce an error.&lt;/p&gt;

&lt;p&gt;monoc_gridfs_create_file_from_stream()&apos;s documentation says it returns a &quot;newly allocated&quot; file, and there is a note that it will read the stream until EOF; mongoc_gridfs_file_seek() does not mention affecting the new-ness state of the file (making this behavior a bit surprising).&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;To reproduce:&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;In our example program, example-gridfs, this can be observed with the method suggested by the submitter:&lt;/p&gt;

&lt;p&gt;fallocate -l 1KB foo-1kb&lt;br/&gt;
fallocate -l 2GB bar-2gb&lt;/p&gt;

&lt;p&gt;./example-gridfs write foo ./foo-1kb&lt;/p&gt;

&lt;p&gt;...and then:&lt;/p&gt;

&lt;p&gt;./example-gridfs write bar ./bar-2gb&lt;br/&gt;
Cannot set file id after saving file.&lt;/p&gt;

&lt;p&gt;Notice that the first file uses the value set by the example program, but the large file uses an auto-generated id:&lt;br/&gt;
db.fs.files.find({})&lt;/p&gt;

&lt;p&gt;[&lt;br/&gt;
  &lt;/p&gt;
{
    _id: 1,
    chunkSize: 261120,
    filename: &apos;foo&apos;,
    length: Long(&quot;1000&quot;),
    uploadDate: ISODate(&quot;2022-03-19T00:04:17.945Z&quot;)
  }
&lt;p&gt;,&lt;/p&gt;
  {
    _id: ObjectId(&quot;62351e38952337de1d0f8be1&quot;),
    chunkSize: 261120,
    filename: &apos;bar&apos;,
    length: Long(&quot;2000000000&quot;),
    uploadDate: ISODate(&quot;2022-03-19T00:05:12.207Z&quot;)
  }
&lt;p&gt;]&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Analysis:&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;This is because mongoc_gridfs_create_file_from_stream() in example-gridfs.c:116 has called mongoc_gridfs_file_seek() in&lt;br/&gt;
mongoc-gridfs.c:329, which in turn has written the file (mongoc-gridfs-file.c:971) via _mongoc_gridfs_file_flush_page() in&lt;br/&gt;
(mongoc-gridfs-file.c:674).&lt;/p&gt;

&lt;p&gt;Our example program assumes that the file is still new (and, indirectly, therefore has &quot;is_dirty&quot; still set), and in (example-gridfs.c:130) when mongoc_gridfs_file_set_id() is called the function fails because the stream has actually already been written to a file as per above.&lt;/p&gt;

&lt;p&gt;Note the &quot;mongofiles&quot; Go tool generates ids on its own in every case rather than allowing autogeneration, so it doesn&apos;t see this issue.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Recommendations:&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There are two approaches we might consider. The first is to see if it&apos;s possible to avoid the write in both functions, or at least in mongoc_gridfs_create_file_from_stream(). This effort might be disproportionate unless this is frequently encountered. One workaround is to do what the &quot;mongofiles&quot; tool does and always generate UUIDs on the client side manually; another is to update the written file chunks when a change of id is needed, which is a potentially expensive operation.&lt;/p&gt;

&lt;p&gt;In any case, it is probably worthwhile to be sure this behavior is documented-- a comment in the example and update to the deprecated API documentation would be helpful.&lt;/p&gt;</comment>
                            <comment id="4362498" author="JIRAUSER1261121" created="Thu, 17 Feb 2022 20:09:46 +0000"  >&lt;p&gt;Hello, thank you for reporting this issue! We will make time to investigate and compare it with &lt;a href=&quot;https://jira.mongodb.org/browse/CDRIVER-1976&quot; title=&quot;mongoc_gridfs_file_set_id always return false, error: Cannot set file id after saving file.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CDRIVER-1976&quot;&gt;&lt;del&gt;CDRIVER-1976&lt;/del&gt;&lt;/a&gt; soon. &lt;/p&gt;</comment>
                            <comment id="4358297" author="JIRAUSER1264943" created="Wed, 16 Feb 2022 11:52:17 +0000"  >&lt;p&gt;I had the same problem as&#160;&#160;&lt;a href=&quot;https://jira.mongodb.org/browse/CDRIVER-1976&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/CDRIVER-1976&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is normal when the file has only one chunk.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i03fzj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>