<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:56:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-21284] $lookup should cache query results</title>
                <link>https://jira.mongodb.org/browse/SERVER-21284</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;If you are doing an aggregation on a large collection, and your aggregation contains a $lookup stage which queries a smaller collection (think &amp;lt;= 100 documents), the $lookup stage will issue a query for each document in the larger collection.&lt;/p&gt;

&lt;p&gt;The $lookup stage should cache the results of the query to avoid unnecessary work in situations like this.&lt;/p&gt;

&lt;h5&gt;&lt;a name=&quot;OriginalDescription&quot;&gt;&lt;/a&gt;Original Description&lt;/h5&gt;

&lt;p&gt;I have tried the $lookup aggregator and the performance was not as expected.&lt;br/&gt;
I have one collection with about 350K documents and one with 100 documents.&lt;br/&gt;
There are two indexes on the &#8220;localfield&#8221; and on the &#8220;foreignfield&#8221;.&lt;br/&gt;
I am trying to join the first collection with the second.&lt;/p&gt;

&lt;p&gt;After an exchange of emails with Norberto : &quot;it looks like it is not using either indexes, and therefore it&apos;s taking so long. &quot;&lt;/p&gt;

&lt;p&gt;Please find above the link to test:&lt;br/&gt;
&lt;a href=&quot;https://github.com/bappr/lookup-test&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/bappr/lookup-test&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="237722">SERVER-21284</key>
            <summary>$lookup should cache query results</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="bappr">Benjamin Appr&#233;derisse</reporter>
                        <labels>
                            <label>optimization</label>
                            <label>performance</label>
                            <label>query-product-scope-1</label>
                            <label>query-product-urgency-2</label>
                            <label>query-product-value-2</label>
                    </labels>
                <created>Wed, 4 Nov 2015 15:34:21 +0000</created>
                <updated>Thu, 2 Mar 2023 19:05:51 +0000</updated>
                                            <version>3.2.0-rc1</version>
                                                    <component>Aggregation Framework</component>
                                        <votes>11</votes>
                                    <watches>32</watches>
                                                                                                                <comments>
                            <comment id="4760689" author="xiaoqiang.chou@gmail.com" created="Thu, 18 Aug 2022 15:21:05 +0000"  >&lt;p&gt;Any progress on this?&lt;/p&gt;</comment>
                            <comment id="4339340" author="JIRAUSER1258820" created="Mon, 7 Feb 2022 15:45:36 +0000"  >&lt;p&gt;Are there any updates related to this issue? Our company have ran into this issue. Even though we have been using the embed documents approach, and the database works blazing fast when executing the standard CRUD operations, we can&apos;t avoid some of the lookup &quot;join&quot; operations that are required for the generation of certain reports. Even when using indexing to support those join operations, when joining tables that have 100K+ documents the operations take more than 30 secs! It&apos;s a petty that this issue hasn&apos;t yet been resolved, because aside of that, the MongoDb has done a stellar job!&lt;/p&gt;</comment>
                            <comment id="1080932" author="charlie.swanson" created="Thu, 5 Nov 2015 19:53:17 +0000"  >&lt;p&gt;Ok great! I have filed &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-21312&quot; title=&quot;$lookup should batch query requests&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-21312&quot;&gt;SERVER-21312&lt;/a&gt; if you want to track that ticket as well.&lt;/p&gt;</comment>
                            <comment id="1079748" author="norberto@10gen.com" created="Wed, 4 Nov 2015 20:21:48 +0000"  >&lt;p&gt;Hi Benjamin, &lt;/p&gt;

&lt;p&gt;He have done a bit more of drilling into the issue you raised and as &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=charlie.swanson&quot; class=&quot;user-hover&quot; rel=&quot;charlie.swanson&quot;&gt;charlie.swanson&lt;/a&gt; is mentioning the system is actually using the indexes accordingly. &lt;br/&gt;
The problem will end up with the cardinality of the overall operation that will need to run a query 350K times on the &lt;b&gt;from&lt;/b&gt; collection the data to accomplish the $lookup. &lt;/p&gt;

&lt;p&gt;N.&lt;/p&gt;</comment>
                            <comment id="1079723" author="bappr" created="Wed, 4 Nov 2015 20:05:19 +0000"  >&lt;p&gt;Hi Charlie,&lt;/p&gt;

&lt;p&gt;Thanks for giving me these explanations.&lt;/p&gt;

&lt;p&gt;We can use this way.&lt;/p&gt;

&lt;p&gt;Benjamin&lt;/p&gt;</comment>
                            <comment id="1079619" author="charlie.swanson" created="Wed, 4 Nov 2015 18:58:44 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bappr&quot; class=&quot;user-hover&quot; rel=&quot;bappr&quot;&gt;bappr&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;I have been in communication with Norberto, and it looks like the $lookup stage is indeed using an index on the foreign collection. It is not using one on the local collection, but that is expected, and probably the fastest way to proceed, as it is looking at all documents in the collection.&lt;/p&gt;

&lt;p&gt;So my impression is that $lookup is behaving exactly as expected. It is probably slow because it has to do 350K queries. If this is significantly slower than doing those queries on their own, that would indeed be unexpected, and a likely bug.&lt;/p&gt;

&lt;p&gt;That said, there are many potential optimizations that we are not yet taking advantage of for the $lookup stage. For instance, we could maintain a cache of the results of the queries, or batch the lookups to do fewer queries. These are things we have thought about doing, but do not have tickets in JIRA yet.&lt;/p&gt;

&lt;p&gt;If this is more what you meant by &quot;performance is not as expected&quot;, we can morph this ticket into the request to cache the results, and open another ticket for batching the lookups. If that sounds like a good way to proceed, let me know and I&apos;ll go ahead with it.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="1990837">SERVER-64021</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="237997">SERVER-21312</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1407089">SERVER-49461</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 4 Nov 2015 18:42:43 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 24 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>joseph.kanaan@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 24 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>bappr</customfieldvalue>
            <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>paplabros@gmail.com</customfieldvalue>
            <customfieldvalue>norberto.leite</customfieldvalue>
            <customfieldvalue>xiaoqiang.chou@gmail.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrkptb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2e6n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs9uon:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>