<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 02:55:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-964] Get $elemMatch to correctly use the index on the array</title>
                <link>https://jira.mongodb.org/browse/SERVER-964</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;I am using MongoDB to store at the moment 100,000 documents. Each document can have from 1 to 150 properties. I need to search by property range for any combination of properties. On average a document has 5 properties set. Here is for example 2 documents, the first has properties 1, 2 and 17 set and doc2 has properties 1, 3 and 45.&lt;/p&gt;

&lt;p&gt;doc1 = &lt;/p&gt;
{prop1: 123.1, prop2: 345.2, prop17: 12.0}
&lt;p&gt;doc2 = &lt;/p&gt;
{prop1: 124.1, prop3: 33455.2, prop45: 11232.0}

&lt;p&gt;The kind of queries needed are:&lt;/p&gt;

&lt;p&gt;db.collection.find({prop1: {$gt: 123.0, $lt: 250.0}, prop3: {$gt: 10.0}});&lt;/p&gt;

&lt;p&gt;It is not possible to create 150 indexes, one for each property, so Mathias (mstearn) suggested to store the properties this way with a single index on p:&lt;/p&gt;

&lt;p&gt;doc1 = {p: [&lt;/p&gt;
{prop1: 123.1}
&lt;p&gt;, &lt;/p&gt;
{prop2: 345.2}
&lt;p&gt;, &lt;/p&gt;
{prop17: 12.0}
&lt;p&gt;]}&lt;br/&gt;
doc2 = {p: [&lt;/p&gt;
{prop1: 124.1}
&lt;p&gt;, &lt;/p&gt;
{prop3: 33455.2}
&lt;p&gt;, &lt;/p&gt;
{prop45: 11232.0}
&lt;p&gt;]}&lt;/p&gt;

&lt;p&gt;And to query this way:&lt;/p&gt;

&lt;p&gt;db.testarray.find({&quot;p&quot;: {$elemMatch: {prop1: {$gt: 123.0, $lt: 250.0}, prop3: {$gt: 10.0}}}});&lt;/p&gt;

&lt;p&gt;But in that case, the index on p is not used and the complete collection is scanned. &lt;/p&gt;

&lt;p&gt;See the attached javascript setting up a small 10000 document collection with some test data. &lt;/p&gt;</description>
                <environment>$ uname -a&lt;br/&gt;
Linux server 2.6.24-27-generic #1 SMP Fri Mar 12 01:10:31 UTC 2010 i686 GNU/Linux</environment>
        <key id="11699">SERVER-964</key>
            <summary>Get $elemMatch to correctly use the index on the array</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="eliot">Eliot Horowitz</assignee>
                                    <reporter username="loic">Lo&#239;c d&apos;Anterroches</reporter>
                        <labels>
                    </labels>
                <created>Mon, 5 Apr 2010 16:02:36 +0000</created>
                <updated>Fri, 7 Mar 2014 00:57:20 +0000</updated>
                            <resolved>Tue, 6 Apr 2010 09:39:16 +0000</resolved>
                                    <version>1.3.2</version>
                                                    <component>Querying</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="13524" author="loic" created="Fri, 9 Apr 2010 03:02:47 +0000"  >&lt;p&gt;Aaron, yeah! I was puzzled, I have now 10 different ways to store the data and index it and I was wondering why some queries where faster than others with the $all keyword. Now, I can use my knowledge of the data to write my queries in an order which will match the less available properties first. Really, really good tip.&lt;/p&gt;

&lt;p&gt;For the compound index, I totally forgot about it, thanks for the reminder, now back to the tests.&lt;/p&gt;

&lt;p&gt;This discussion reveals one clear thing: know and analyze your dataset if you want to take the maximum out of MongoDB.&lt;/p&gt;</comment>
                            <comment id="13519" author="aaron" created="Thu, 8 Apr 2010 17:00:05 +0000"  >&lt;p&gt;Hi Loic,&lt;/p&gt;

&lt;p&gt;If you want to look up quickly on both key and val, you can create a compound index like this &lt;/p&gt;
{&apos;p.key&apos;:1,&apos;p.val&apos;:1}
&lt;p&gt;.  (Creating separate indexes on key and val won&apos;t help, as you&apos;ve mentioned.)&lt;/p&gt;

&lt;p&gt;Also, I should point out that when you use $all, index matching is currently just performed on the first element in the $all array spec.  So all documents matching that first $all element are checked to see if they match.&lt;/p&gt;</comment>
                            <comment id="13496" author="loic" created="Wed, 7 Apr 2010 14:38:31 +0000"  >&lt;p&gt;Aaron, thanks a lot. I keep commenting here as it may be of interest for other people.&lt;/p&gt;

&lt;p&gt;In the particular example of Aaron, even if you add a &quot;p.value&quot; index, it is not used. This means that the search will scan all the document having the given property. This is good and bad. &lt;/p&gt;

&lt;p&gt;The good point, when you have a very sparse &quot;density of properties&quot;, you can basically cut the search tremendously and this is my case, except for some of the properties. &lt;/p&gt;

&lt;p&gt;The bad point is that if you have one property which is always set for your 26 millions documents, this is not going to work. For example the weight of a molecule as it is always known.&lt;/p&gt;

&lt;p&gt;The solution: Put all the properties with a very low count in the storage proposed by Aaron and use dedicated indexes for the high density properties. Be smart to rebalance the indexes over the time depending on the usage patterns and the data in the db.&lt;/p&gt;

&lt;p&gt;This is really great because it means no special hack! Thanks a lot!&lt;/p&gt;</comment>
                            <comment id="13494" author="aaron" created="Wed, 7 Apr 2010 13:29:48 +0000"  >&lt;p&gt;So, just in case you try the key/val method I described above you can put several $elemMatch specs inside an $all in order to query multiple properties simultaneously.&lt;/p&gt;</comment>
                            <comment id="13467" author="loic" created="Tue, 6 Apr 2010 13:21:13 +0000"  >&lt;p&gt;Thanks a lot for the time you took to answer. I will check this memcmp format, it may be the most efficient way at the end as I will not have to store the name of the property as a key in the array and the indexing will be efficient on the &quot;missing properties&quot;. &lt;/p&gt;

&lt;p&gt;Really thank you, I will keep you informed.&lt;/p&gt;</comment>
                            <comment id="13465" author="eliot" created="Tue, 6 Apr 2010 13:11:42 +0000"  >&lt;p&gt;one options is munging key and value&lt;/p&gt;

&lt;p&gt;so before&lt;br/&gt;
doc1 = &lt;/p&gt;
{prop1: 123.1, prop2: 345.2, prop17: 12.0}
&lt;p&gt; &lt;br/&gt;
have&lt;/p&gt;
{ properties : [ &quot;1---123.&quot; , &quot;2---345.2&quot; ] }
&lt;p&gt;then you could only do alpha compares.&lt;/p&gt;

&lt;p&gt;another option is to use a binary type or just a binary array&lt;/p&gt;

&lt;p&gt;use the first 4 bytes as the key, and then put your data in a memcmp format.&lt;/p&gt;</comment>
                            <comment id="13464" author="aaron" created="Tue, 6 Apr 2010 13:01:14 +0000"  >&lt;p&gt;An alternative data model you could use is this:&lt;/p&gt;

&lt;p&gt;{p: [&lt;/p&gt;
{key:&quot;prop1&quot;,val:123.4}
&lt;p&gt;,&lt;/p&gt;
{key:&quot;prop2&quot;,val:555}
&lt;p&gt;]}&lt;/p&gt;

&lt;p&gt;then make an index on &lt;/p&gt;
{&apos;p.key&apos;:1}
&lt;p&gt; and query like {p:{$elemMatch:{key:&quot;prop1&quot;,val:{$gt:1}}}.  The downside is that you can only query for one property type at a time.&lt;/p&gt;</comment>
                            <comment id="13462" author="loic" created="Tue, 6 Apr 2010 12:04:14 +0000"  >&lt;p&gt;Ok, so do I understand that one cannot have 100 independent properties for a document and then search in two of them with greater/lower bounds? I tried all the combination of storage:&lt;/p&gt;

&lt;p&gt;1. &lt;/p&gt;
{prop1: 123.4, prop2: 234.5, ...}
&lt;p&gt;2. {p: [&lt;/p&gt;
{prop1: 123.4}
&lt;p&gt;, &lt;/p&gt;
{prop2: 234.5}
&lt;p&gt;, ...]}&lt;br/&gt;
3. {p: [&lt;/p&gt;
{prop1: 123.4, prop2: 234.5, ...}
&lt;p&gt;]}&lt;/p&gt;

&lt;p&gt;Without success in getting the index to be used. In fact, it only works in case 1 when I index a subset of the 100 properties. &lt;/p&gt;

&lt;p&gt;I will have a close look at what my users are looking at and try to index the most frequently used queries and if I really need more, I will recompile mongoDB to bump up the index limit. I know it will be bad for the insert time on the given collection but in that particular case it is not a problem.&lt;/p&gt;</comment>
                            <comment id="13460" author="eliot" created="Tue, 6 Apr 2010 09:39:16 +0000"  >&lt;p&gt;ok - closing since doesn&apos;t really make sense&lt;/p&gt;</comment>
                            <comment id="13455" author="loic" created="Tue, 6 Apr 2010 02:36:24 +0000"  >&lt;p&gt;In my particular case, the way the data is stored is not a problem, I can also store the way Aaron proposed. As long as my min/max query runs against the index, I am happy. Also note that in my case, I insert/update the data once a month and query 1000&apos;s a day, so time to insert is not an issue.&lt;/p&gt;</comment>
                            <comment id="13452" author="aaron" created="Tue, 6 Apr 2010 02:08:14 +0000"  >&lt;p&gt;I don&apos;t think this works in general.  An index on &apos;p&apos; doesn&apos;t necessarily tell us where to find &apos;x&apos; values of interest:&lt;/p&gt;

&lt;p&gt;db.c.find( {p: {$elemMatch: {x:{$gt:1}}}} )&lt;/p&gt;

&lt;p&gt;we may have documents like&lt;br/&gt;
{p:[&lt;/p&gt;
{y:2,x:4}
&lt;p&gt;]}&lt;br/&gt;
{p:[&lt;/p&gt;
{y:2,z:10,x:-1}
&lt;p&gt;]}&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10190" name="arraytest.js" size="1214" author="loic" created="Mon, 5 Apr 2010 16:02:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 6 Apr 2010 02:08:14 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        13 years, 45 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            13 years, 45 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>aaron</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>loic</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrpobj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hriolb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>22636</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hszy8n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>