<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:59:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-22214] CloneCollection with query doing COLLSCAN although there is an index on field</title>
                <link>https://jira.mongodb.org/browse/SERVER-22214</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hello!&lt;/p&gt;

&lt;p&gt;I have a capped collection with about 150 million documents in it (20 Gb of data). The insertion rate to this collection is about 500-100 doc/sec. I have an index on one of the fields - &quot;start&quot;, which is a number. I perform scheduled cloneCollection from another mongod instance (both instances version is 3.2) with a query like {start: {$gt: &amp;lt;some number&amp;gt;}}. The cloneCollection command takes very long, about 10 minutes and is slowly increasing with every next cloneCollection.&lt;/p&gt;

&lt;p&gt;When I searched through the mongod.log, I found the entries like this:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;2015-12-25T16:09:53.073+0600 I QUERY    [conn2951] query statserver-collector.statrecords query: { query: { _id: { $gt: 165091491.0 } }, $snapshot: true } planSummary: COLLSCAN cursorid:22150682963 ntoreturn:0 ntoskip:0 exhaust:1 keysExamined:0 docsExamined:121741412 keyUpdates:0 writeConflicts:0 numYields:952128 nreturned:101 reslen:17236 locks:{ Global: { acquireCount: { r: 1904258 } }, Database: { acquireCount: { r: 952129 } }, Collection: { acquireCount: { r: 952129 } } } 339569ms&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;I see the PlanSummary is COLLSCAN. As I understand, this means that the query is performing without an index, although there is an index on &quot;start&quot;. When I type in mongo console the similar query:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;db.mycollection.find({_id: {$gt: 165091491}})&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;the planSummary is IXSCAN and the query return rows very quickly.&lt;/p&gt;

&lt;p&gt;The main thing is: we downgraded MongoDB from 3.2 to 3.0.8 (only one instance, we cloneCollection from), and this issue just disappear. We didn&apos;t change our code, steps are the same, and cloneCollection now finish in a few seconds.&lt;/p&gt;

&lt;p&gt;So I think, the issue is somewhere in v3.2. I tried to search but didn&apos;t find any reported problems similar to this.&lt;/p&gt;</description>
                <environment></environment>
        <key id="259118">SERVER-22214</key>
            <summary>CloneCollection with query doing COLLSCAN although there is an index on field</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="backlog-server-query">Backlog - Query Team</assignee>
                                    <reporter username="aborunov">Andrey Borunov</reporter>
                        <labels>
                            <label>query-44-grooming</label>
                    </labels>
                <created>Mon, 18 Jan 2016 08:02:18 +0000</created>
                <updated>Tue, 6 Dec 2022 04:35:53 +0000</updated>
                            <resolved>Fri, 26 Jul 2019 20:54:09 +0000</resolved>
                                    <version>3.2.0</version>
                                                    <component>Querying</component>
                                        <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="2346398" author="david.storch" created="Fri, 26 Jul 2019 20:53:27 +0000"  >&lt;p&gt;This case resolves around changes in behavior to the &lt;tt&gt;snapshot()&lt;/tt&gt; find command option, but support for this option was removed in version 4.0 under &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-32174&quot; title=&quot;Remove old snapshot query option&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-32174&quot;&gt;&lt;del&gt;SERVER-32174&lt;/del&gt;&lt;/a&gt;. Therefore, I don&apos;t think there is any action for us to take around this ticket. Closing as Gone Away.&lt;/p&gt;</comment>
                            <comment id="1149124" author="rassi@10gen.com" created="Wed, 20 Jan 2016 20:20:42 +0000"  >&lt;p&gt;Just based on your description, perhaps the number of documents returned from your initial clone query { _id: { $gt: 165091491.0 } } has changed in the last week?&lt;/p&gt;

&lt;p&gt;To diagnose further, you can emulate the query that cloneCollection issues against the remote server with find(...).shapshot(); if you run this query with explain(&quot;executionStats&quot;) and take a look at the &quot;executionMillisEstimate&quot; figure, you&apos;ll see a breakdown of where the execution time is going.&lt;/p&gt;</comment>
                            <comment id="1148685" author="aborunov" created="Wed, 20 Jan 2016 15:53:16 +0000"  >
&lt;p&gt;Thank you, Jason! I will try the given workaround. &lt;/p&gt;

&lt;p&gt;One thing I also wanted to mention: my collection is capped and the limit is reached. So the number of documents in it is not changed (_id values are changed), and I assumed that the time to scan through them would be constant too, but it won&apos;t. The time was slowly increasing: first it was 5 minutes, a week later (we perform cloneCollection every 5-10 minutes) it was about 13 minutes. What is the possible reason for that? Is it normal behaviour?&lt;/p&gt;</comment>
                            <comment id="1147478" author="rassi@10gen.com" created="Tue, 19 Jan 2016 17:19:18 +0000"  >&lt;p&gt;This indeed has changed in 3.2, as a consequence of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-19593&quot; title=&quot;Allow collscans on $snapshot queries when not using MMAP1&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-19593&quot;&gt;&lt;del&gt;SERVER-19593&lt;/del&gt;&lt;/a&gt;.  The cloneCollection command sets the &lt;a href=&quot;https://docs.mongodb.org/manual/reference/method/cursor.snapshot/#cursor.snapshot&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&quot;snapshot&quot;&lt;/a&gt; query option on queries issued against the remote server.  Before 3.2, the &quot;snapshot&quot; query option always forced an index scan on the _id index, but as of 3.2 this query option forces a collection scan when used against a server configured with the WiredTiger storage engine.&lt;/p&gt;

&lt;p&gt;We should consider whether or not we want to change the &quot;cloneCollection&quot; and &quot;copydb&quot; commands to no longer use the &quot;snapshot&quot; query option (note that the &quot;clone&quot; command has never used this query option, as far as I can tell).  I&apos;m moving this ticket to the &quot;Needs Triage&quot; state; we should make this decision sometime in the next few weeks.&lt;/p&gt;

&lt;p&gt;Andrey: thanks for reporting this issue.  As a workaround in the meantime, you could change your application to no longer use the &quot;cloneCollection&quot; command, and instead implement similar functionality on the client side by issuing a query with a {_id: {$gt: ...}} predicate on the &quot;source&quot; server, followed by a bulk insert on the &quot;destination&quot; server.&lt;/p&gt;</comment>
                            <comment id="1146263" author="aborunov" created="Mon, 18 Jan 2016 08:13:27 +0000"  >&lt;p&gt;Sorry, another correction: no index on &quot;start&quot; field and we do not query by this field, as one can see in the log. Instead, we query by _id field, which is a number in our case and indeed has an index.&lt;/p&gt;</comment>
                            <comment id="1146260" author="aborunov" created="Mon, 18 Jan 2016 08:04:33 +0000"  >&lt;p&gt;&amp;gt; The insertion rate to this collection is about 500-100 doc/sec&lt;/p&gt;

&lt;p&gt;Correct: 500-1000 doc/sec&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="222643">SERVER-19593</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="468230">SERVER-32174</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25143"><![CDATA[Query]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 19 Jan 2016 17:19:18 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 28 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 28 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>aborunov</customfieldvalue>
            <customfieldvalue>backlog-server-query</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>rassi</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrkkfj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr72tb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs2s6n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>