<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:14:07 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-47408] oplog documents from transactions can breach the maximum bson size and break mongorestore</title>
                <link>https://jira.mongodb.org/browse/SERVER-47408</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;I&apos;ve run into an issue where point-in-time snapshots of a mongo server produced using &lt;tt&gt;mongodump --oplog&lt;/tt&gt; can be unusable, if they happen to coincide with a large transaction write operation. In these cases, the oplog.bson produced in the dump will contain a document that exceeds the 16MiB size limit set in mongo-tools-common, and thus restoring with &lt;tt&gt;mongorestore --oplogReplay&lt;/tt&gt; will fail.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;# mongorestore --oplogReplay&lt;/tt&gt;&lt;br/&gt;
 {{ 2020-04-08T13:15:13.749+0000 using default &apos;dump&apos; directory}}&lt;br/&gt;
 {{ 2020-04-08T13:15:13.749+0000 preparing collections to restore from}}&lt;br/&gt;
 {{ 2020-04-08T13:15:13.751+0000 reading metadata for foo.junk from dump/foo/junk.metadata.json}}&lt;br/&gt;
 {{ 2020-04-08T13:15:13.762+0000 restoring foo.junk from dump/foo/junk.bson}}&lt;br/&gt;
 {{ 2020-04-08T13:15:16.748+0000 &lt;span class=&quot;error&quot;&gt;&amp;#91;........................&amp;#93;&lt;/span&gt; foo.junk 4.55MB/131MB (3.5%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:19.748+0000 &lt;a href=&quot;#.......................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;.......................&lt;/a&gt; foo.junk 8.74MB/131MB (6.7%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:22.749+0000 &lt;a href=&quot;##......................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;#......................&lt;/a&gt; foo.junk 13.0MB/131MB (10.0%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:25.751+0000 &lt;a href=&quot;###.....................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;##.....................&lt;/a&gt; foo.junk 17.6MB/131MB (13.5%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:28.752+0000 &lt;a href=&quot;####....................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;###....................&lt;/a&gt; foo.junk 22.1MB/131MB (16.9%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:31.748+0000 &lt;a href=&quot;####....................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;###....................&lt;/a&gt; foo.junk 26.5MB/131MB (20.3%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:34.748+0000 &lt;a href=&quot;#####...................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;####...................&lt;/a&gt; foo.junk 31.3MB/131MB (24.0%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:37.748+0000 &lt;a href=&quot;######..................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;#####..................&lt;/a&gt; foo.junk 36.1MB/131MB (27.7%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:40.753+0000 &lt;a href=&quot;#######.................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;######.................&lt;/a&gt; foo.junk 40.7MB/131MB (31.2%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:43.748+0000 &lt;a href=&quot;########................&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;#######................&lt;/a&gt; foo.junk 45.4MB/131MB (34.8%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:46.748+0000 &lt;a href=&quot;#########...............&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;########...............&lt;/a&gt; foo.junk 49.7MB/131MB (38.1%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.489+0000 &lt;a href=&quot;########################&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;#######################&lt;/a&gt; foo.junk 131MB/131MB (100.0%)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.489+0000 no indexes to restore}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.489+0000 finished restoring foo.junk (1000001 documents, 0 failures)}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.489+0000 replaying oplog}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.496+0000 applied 1 oplog entries}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.496+0000 Failed: restore error: error reading oplog bson input: invalid BSONSize: 16777499 bytes}}&lt;br/&gt;
 {{ 2020-04-08T13:15:48.496+0000 1000001 document(s) restored successfully. 0 document(s) failed to restore.}}&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;From looking at the underlying local.oplog.rs collection, I think this may be a problem with how mongod attempts to split transactions into documents when writing to the oplog. By querying the local.oplog.rs collection it&apos;s possible to see the offending BSON documents in the oplog. Interestingly, despite this issue, it appears that replication still works correctly, although I haven&apos;t tested enough to confidently say this is the case.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I have attached one of the offending oplog.bson dumps to this issue.&lt;/p&gt;</description>
                <environment>OS: CentOS Linux release 7.7.1908 (Core)&lt;br/&gt;
Kernel: 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020&lt;br/&gt;
&lt;br/&gt;
Also affects the mongo:4.2.5 docker image</environment>
        <key id="1308300">SERVER-47408</key>
            <summary>oplog documents from transactions can breach the maximum bson size and break mongorestore</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="george@netcraft.com">George Field</reporter>
                        <labels>
                            <label>backup</label>
                            <label>mongod</label>
                            <label>oplog</label>
                            <label>replication</label>
                            <label>transactions</label>
                    </labels>
                <created>Wed, 8 Apr 2020 14:39:39 +0000</created>
                <updated>Fri, 22 Sep 2023 18:57:07 +0000</updated>
                            <resolved>Tue, 21 Apr 2020 14:51:01 +0000</resolved>
                                    <version>4.2.5</version>
                                                    <component>Replication</component>
                    <component>Tools</component>
                                        <votes>0</votes>
                                    <watches>12</watches>
                                                                                                                <comments>
                            <comment id="3049137" author="jessica.sigafoos" created="Tue, 21 Apr 2020 14:50:21 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=george%40netcraft.com&quot; class=&quot;user-hover&quot; rel=&quot;george@netcraft.com&quot;&gt;george@netcraft.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Please note that this work is now being tracked in &lt;a href=&quot;https://jira.mongodb.org/browse/TOOLS-2495&quot; title=&quot;Oplog replay can&amp;#39;t handle entries &amp;gt; 16 MB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;TOOLS-2495&quot;&gt;&lt;del&gt;TOOLS-2495&lt;/del&gt;&lt;/a&gt;, and this ticket will be closed as a duplicate.&#160; If you&apos;d like to track the progress of the work moving forward, please watch &lt;a href=&quot;https://jira.mongodb.org/browse/TOOLS-2495&quot; title=&quot;Oplog replay can&amp;#39;t handle entries &amp;gt; 16 MB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;TOOLS-2495&quot;&gt;&lt;del&gt;TOOLS-2495&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thank you!&lt;br/&gt;
Jess&lt;/p&gt;</comment>
                            <comment id="3033866" author="carl.champain" created="Thu, 9 Apr 2020 18:46:20 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=george%40netcraft.com&quot; class=&quot;user-hover&quot; rel=&quot;george@netcraft.com&quot;&gt;george@netcraft.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for taking the time to submit this report!&lt;br/&gt;
We are passing this ticket along to the appropriate team for further investigation. Updates will be posted on this ticket as they happen.&lt;/p&gt;

&lt;p&gt;Kind regards,&lt;br/&gt;
Carl&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="1185463">TOOLS-2495</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="1314388">TOOLS-2542</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="255448" name="oplog.bson.gz" size="83335" author="george@netcraft.com" created="Wed, 8 Apr 2020 14:38:09 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25144"><![CDATA[Tools]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 9 Apr 2020 18:46:20 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        3 years, 42 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>martin.bajana@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            3 years, 42 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>carl.champain@mongodb.com</customfieldvalue>
            <customfieldvalue>george@netcraft.com</customfieldvalue>
            <customfieldvalue>jessica.sigafoos@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hxe5j3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hx1p5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Reproducing the large oplog entries is straightforward, simply make a number of very large writes in a transaction. It&apos;s easier to demonstrate using monogdump, steps to reproduce with mongodump follow.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Capturing the error with mongodump relies on timing, so depending on hardware the test case might not hit the issue, although it&apos;s working reliably on my machine. As per the issue description, this is affecting MongoDB 4.2.5, installed on CentOS 7 via the official mongodb-org-4.2 RPM repo, but I&apos;ve also been able to reproduce using the mongo:4.2.5 docker image.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;To reproduce, it&apos;s necessary to write a large transaction while mongodump is dumping that database. To reproduce this, I&apos;ve written a small go command to write a large amount of dummy data to a collection (in order to slow down the mongodump operation), and another go command to perform a short transaction that writes a few large documents. Steps:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Create a sufficiently large collection such that mongodump will take at least a few seconds to complete.&lt;/li&gt;
	&lt;li&gt;Start a mongodump with oplog writing enabled.&lt;/li&gt;
	&lt;li&gt;While mongodump is running, complete a transaction with a very large amount of data to write, such that the large transaction is captured in mongodump&apos;s oplog.bson&lt;/li&gt;
	&lt;li&gt;Try to inspect the produced oplog.bson with bsondump, or try to restore the dump with &lt;tt&gt;mongorestore --oplogReplay&lt;/tt&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Some code to help reproduce can be found here, tested on Go 1.14:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/ks07/mongodb-oplog-bug&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/ks07/mongodb-oplog-bug&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A full log showing usage in reproducing the error and the error output:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/ks07/869229ea0fd26c6b0058f81427691070&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Note that the example writes a small number of large documents very close to the 16 MiB limit, but in production usage we have noticed this behaviour with transactions writing a large number of smaller documents.&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hxdrsf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>