<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:10:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-6099] Advanced Queries $gt and $lt take longer in &apos;indexed&apos; collection</title>
                <link>https://jira.mongodb.org/browse/SERVER-6099</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;My collection has 1 million data.&lt;/p&gt;

&lt;p&gt;I am performing this query to retrieve&lt;br/&gt;
x &amp;gt; 70000 or x &amp;lt; 10000&lt;/p&gt;

&lt;p&gt;In case 1, the pid (player id) of the collection isn&apos;t indexed.&lt;br/&gt;
a) if i query with {$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$gt:700000}},{pid:{$lt:100000}}&amp;#93;&lt;/span&gt;},&lt;br/&gt;
   it took around 750ms&lt;br/&gt;
b) if i query with {$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$lt:100000}},{pid:{$gt:700000}}&amp;#93;&lt;/span&gt;}, (simply change the order of expression)&lt;br/&gt;
   it took around 800ms&lt;/p&gt;

&lt;p&gt;Everything is still ok.&lt;/p&gt;

&lt;p&gt;BUT in case 2, i indexed pid, which supposed to provide a faster query. However,&lt;br/&gt;
a) if i query with {$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$gt:700000}},{pid:{$lt:100000}}&amp;#93;&lt;/span&gt;},&lt;br/&gt;
   it took around 980ms&lt;br/&gt;
b) WORSE, if i query with {$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$lt:100000}},{pid:{$gt:700000}}&amp;#93;&lt;/span&gt;}, &lt;br/&gt;
   it took around 1600ms!!&lt;/p&gt;

&lt;p&gt;I have 2 questions here:&lt;br/&gt;
1. Why the indexed collection takes longer?&lt;br/&gt;
2. Why different expressions could have such a big difference in the time of query? ( does it have something to do with $or?)&lt;/p&gt;
</description>
                <environment>linux</environment>
        <key id="41320">SERVER-6099</key>
            <summary>Advanced Queries $gt and $lt take longer in &apos;indexed&apos; collection</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="2" iconUrl="https://jira.mongodb.org/images/icons/priorities/critical.svg">Critical - P2</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="kristina">Kristina Chodorow</assignee>
                                    <reporter username="yong2khoo">btd5nerds</reporter>
                        <labels>
                            <label>query</label>
                    </labels>
                <created>Fri, 15 Jun 2012 02:53:10 +0000</created>
                <updated>Mon, 11 Jul 2016 18:32:22 +0000</updated>
                            <resolved>Tue, 26 Jun 2012 18:58:03 +0000</resolved>
                                    <version>2.0.5</version>
                                                    <component>Querying</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="136769" author="kristina" created="Tue, 26 Jun 2012 18:58:03 +0000"  >&lt;p&gt;Great!&lt;/p&gt;</comment>
                            <comment id="136692" author="yong2khoo" created="Tue, 26 Jun 2012 16:30:04 +0000"  >&lt;p&gt;oO. This makes everything clearer! &lt;/p&gt;

&lt;p&gt;Kristina, thanks a lot! This can be considered resolved. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="136665" author="kristina" created="Tue, 26 Jun 2012 15:35:50 +0000"  >&lt;p&gt;You&apos;re welcome!  &lt;/p&gt;

&lt;p&gt;Sorry it&apos;s taken me a while to get back to you, I&apos;ve been trying to figure this out.&lt;/p&gt;

&lt;p&gt;The likelihood of a query yielding is proportional to how long it runs and the two $or clauses are treated as separate queries.   Later $or clauses can be slower than earlier ones because they do the de-duping.   So, if &amp;gt;700k is slower &lt;em&gt;and&lt;/em&gt; second, it is more likely to hit the yield threshold.&lt;/p&gt;

&lt;p&gt;Also, yielding will only occur when there&apos;s a write waiting to happen.  I&apos;m not sure what your system is doing, but that might be a factor.&lt;/p&gt;</comment>
                            <comment id="135782" author="yong2khoo" created="Fri, 22 Jun 2012 16:43:25 +0000"  >&lt;p&gt;oO. That&apos;s a great info.&lt;br/&gt;
Right, that &apos;yield&apos; is probably the culprit in my experiments.&lt;br/&gt;
From observation, case 2) would tend to have more yield than case 1) (any idea?)&lt;/p&gt;

&lt;p&gt;Anyway, thanks for the explanation.&lt;/p&gt;</comment>
                            <comment id="135778" author="kristina" created="Fri, 22 Jun 2012 16:33:25 +0000"  >&lt;p&gt;Sure! MongoDB has fairly coarse locking (in 2.0, one read/write lock per process, in 2.2 it&apos;ll be one R/W lock per database).  To prevent writes from getting stuck behind long-running reads, queries occasionally yield the read lock so some writes can go, then they&apos;ll grab it again.  The nyields field in explain() output shows you how many times a query yielded the read lock.  &lt;/p&gt;</comment>
                            <comment id="135765" author="yong2khoo" created="Fri, 22 Jun 2012 16:25:57 +0000"  >&lt;p&gt;oO. I think you have just mentioned the term that&apos;s unfamiliar to me: yield. Could you elaborate a little bit what is yield? I have searched around but didn&apos;t get a clear picture about it. Thanks in advance.&lt;/p&gt;</comment>
                            <comment id="135731" author="kristina" created="Fri, 22 Jun 2012 15:15:25 +0000"  >&lt;p&gt;&amp;gt; &quot;2%-40%&quot;&lt;br/&gt;
&amp;gt; This is quite a huge range =.=&lt;/p&gt;

&lt;p&gt;Yeah, it varies on data/access patterns/query/indexes, sorry :-/  I&apos;d generally time both for anything returning more than 20%.&lt;/p&gt;

&lt;p&gt;&amp;gt; I thought they are supposed to consume the similar period of time, but they don&apos;t (and the difference is 2 &amp;gt; times). Is this expected?&lt;/p&gt;

&lt;p&gt;At least in the example you gave, (2) yields once, which would explain the difference in time.  Does (2) consistently yield (nyields&amp;gt;0) and (1) does not?  If it does not always yield, is the difference in query time present when nyields=0?&lt;/p&gt;</comment>
                            <comment id="135581" author="yong2khoo" created="Fri, 22 Jun 2012 01:32:05 +0000"  >&lt;p&gt;First, thanks for the reply.&lt;/p&gt;

&lt;p&gt;&quot;2%-40%&quot;&lt;br/&gt;
This is quite a huge range =.=&lt;br/&gt;
By the way, (apart from the index issue), the thing is, if i use different approach of querying the same range of data:&lt;br/&gt;
1) &amp;gt;700k OR &amp;lt;100k&lt;br/&gt;
2) &amp;lt;100k OR &amp;gt;700k&lt;br/&gt;
I thought they are supposed to consume the similar period of time, but they don&apos;t (and the difference is 2 times). Is this expected?&lt;/p&gt;</comment>
                            <comment id="135324" author="kristina" created="Thu, 21 Jun 2012 14:47:08 +0000"  >&lt;p&gt;It looks like the issue is that using an index is &lt;em&gt;less&lt;/em&gt; efficient that a table scan when you&apos;re returning a large hunk of your data set, which is a known limitation of databases in general (relational and non-relational).  The overhead of going from index entry to doc for each element match is not worth the overhead once you&apos;re returning a certain percent of your data (estimates vary from 2%-40% of your data... kind of depends on what you&apos;re doing).  You can use .hint({$natural:1}) to force a table scan.&lt;/p&gt;</comment>
                            <comment id="133801" author="yong2khoo" created="Mon, 18 Jun 2012 02:30:05 +0000"  >&lt;p&gt;fyi, the &apos;gcid&apos; in the image is actually the &apos;pid&apos; (i just rename it)&lt;/p&gt;</comment>
                            <comment id="133800" author="yong2khoo" created="Mon, 18 Jun 2012 02:23:03 +0000"  >&lt;p&gt;Hi Kristina,&lt;/p&gt;

&lt;p&gt;I have uploaded 3 images:&lt;br/&gt;
a) no index.bmp &lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;this is the performance of querying if the collection is without any index.&lt;/li&gt;
	&lt;li&gt;the left side shows db.coll.find({$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$gt:700000}},{pid:{$lt:100000}}&amp;#93;&lt;/span&gt;}).explain()&lt;/li&gt;
	&lt;li&gt;the right side shows db.coll.find({$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$lt:100000}},{pid:{$gt:700000}}&amp;#93;&lt;/span&gt;}).explain()&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;b) indexOr.bmp&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;this is the querying of the indexed collection&lt;/li&gt;
	&lt;li&gt;db.coll.find({$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$gt:700000}},{pid:{$lt:100000}}&amp;#93;&lt;/span&gt;}).explain()&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;c) indexOr2.bmp&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;this is the querying of the indexed collection&lt;/li&gt;
	&lt;li&gt;db.coll.find({$or:&lt;span class=&quot;error&quot;&gt;&amp;#91;{pid:{$lt:100000}},{pid:{$gt:700000}}&amp;#93;&lt;/span&gt;}).explain()&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The difference of b) and c) can grow very large if collection is expanded.&lt;br/&gt;
I have tried inserting 20M data. The case c) takes 2 times longer than case b). &lt;/p&gt;</comment>
                            <comment id="133315" author="kristina" created="Fri, 15 Jun 2012 15:16:46 +0000"  >&lt;p&gt;Can you do an explain on each of the options?  E.g.,&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&amp;gt; db.coll.find({$or:[{pid:{$gt:700000}},{pid:{$lt:100000}}]}).explain()&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;and so on?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="17303" name="indexOr.bmp" size="1146294" author="yong2khoo" created="Mon, 18 Jun 2012 02:12:05 +0000"/>
                            <attachment id="17304" name="indexOr2.bmp" size="1150770" author="yong2khoo" created="Mon, 18 Jun 2012 02:12:05 +0000"/>
                            <attachment id="17302" name="no index.bmp" size="280422" author="yong2khoo" created="Mon, 18 Jun 2012 02:11:25 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 15 Jun 2012 15:16:46 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        11 years, 34 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            11 years, 34 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>yong2khoo</customfieldvalue>
            <customfieldvalue>kristina</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrnzxz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrir87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>23064</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht0rn3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>