<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:18:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-8954] Index Key Extraction Much Slower for Some Data Schemas Than Others</title>
                <link>https://jira.mongodb.org/browse/SERVER-8954</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;I have a collection that essentially is an _id and a list of objects of&lt;br/&gt;
complex type. There is an multikey index on this collection from the _id&lt;br/&gt;
to the id of each of the items in the list.&lt;/p&gt;

&lt;p&gt;ie:&lt;/p&gt;

&lt;p&gt;  db.test.ensureIndex({_id: 1, &apos;items.id&apos;: 1})&lt;br/&gt;
  db.test.insert({_id: ObjectId(&quot;....&quot;),&lt;br/&gt;
   items: [&lt;br/&gt;
      &lt;/p&gt;
{id: 1}
&lt;p&gt;,&lt;br/&gt;
      &lt;/p&gt;
{id: 2}
&lt;p&gt;,&lt;br/&gt;
      ....&lt;br/&gt;
      &lt;/p&gt;
{id: 1000}
&lt;p&gt;,&lt;br/&gt;
    ]&lt;br/&gt;
 })&lt;/p&gt;

&lt;p&gt;With a small number of items in the list, insert and update times for an&lt;br/&gt;
individual item are reasonable, but once the number of items in the list&lt;br/&gt;
is greater than 1,000 the time to insert or update just one item updates&lt;br/&gt;
starts to slow down dramatically:&lt;/p&gt;

&lt;p&gt;  Inserted document with 1000 items in 0.048251 seconds&lt;br/&gt;
  Updated document ($set) with 1000 items in 0.104173 seconds&lt;br/&gt;
  Updated document ($push) with 1000 items in 0.318420 seconds&lt;br/&gt;
  Inserted document with 2000 items in 0.199266 seconds&lt;br/&gt;
  Updated document ($set) with 2000 items in 0.483723 seconds&lt;br/&gt;
  Updated document ($push) with 2000 items in 1.026530 seconds&lt;br/&gt;
  Inserted document with 3000 items in 0.593618 seconds&lt;br/&gt;
  Updated document ($set) with 3000 items in 1.053177 seconds&lt;br/&gt;
  Updated document ($push) with 3000 items in 2.245902 seconds&lt;br/&gt;
  Inserted document with 4000 items in 0.991389 seconds&lt;br/&gt;
  Updated document ($set) with 4000 items in 1.898991 seconds&lt;br/&gt;
  Updated document ($push) with 4000 items in 4.001129 seconds&lt;br/&gt;
  Inserted document with 5000 items in 1.490980 seconds&lt;br/&gt;
  Updated document ($set) with 5000 items in 3.080210 seconds&lt;br/&gt;
  Updated document ($push) with 5000 items in 6.076108 seconds&lt;br/&gt;
  Inserted document with 6000 items in 2.144194 seconds&lt;br/&gt;
  Updated document ($set) with 6000 items in 4.325883 seconds&lt;/p&gt;

&lt;p&gt;I&apos;ve attached a test program that creates the output described above. It&lt;br/&gt;
will insert a test document with an ever increasing number of items. It&lt;br/&gt;
will then $set the list on the newly inserted document to itself. After&lt;br/&gt;
that it will attempt to $push one new item onto the list.&lt;/p&gt;

&lt;p&gt;I&apos;ve run the same test above with integers as the list items instead&lt;br/&gt;
of an object. As the number of items increases the insert/update speed&lt;br/&gt;
slows down, but the performance doesn&apos;t degrade nearly as severly as it&lt;br/&gt;
does when using objects.&lt;/p&gt;</description>
                <environment>Ubuntu 12.04 LTS</environment>
        <key id="68143">SERVER-8954</key>
            <summary>Index Key Extraction Much Slower for Some Data Schemas Than Others</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="michael@songza.com">Michael Henson</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Mar 2013 16:37:34 +0000</created>
                <updated>Tue, 8 Dec 2015 23:01:57 +0000</updated>
                            <resolved>Tue, 8 Dec 2015 23:01:57 +0000</resolved>
                                    <version>2.2.3</version>
                                                    <component>Index Maintenance</component>
                                        <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="1108913" author="david.storch" created="Tue, 8 Dec 2015 23:01:36 +0000"  >&lt;p&gt;This looks like a duplicate of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-8192&quot; title=&quot;Optimize btree key generation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-8192&quot;&gt;&lt;del&gt;SERVER-8192&lt;/del&gt;&lt;/a&gt;, which was fixed in development version 3.1.1 and is first generally available in production version 3.2.0. Closing as a Duplicate.&lt;/p&gt;</comment>
                            <comment id="289424" author="michael@songza.com" created="Thu, 14 Mar 2013 19:45:54 +0000"  >&lt;p&gt;Thanks for the follow-up Andy. We were able to work around this by removing the index on that field. We&apos;ll be more mindful of that limitation in the future.&lt;/p&gt;</comment>
                            <comment id="289238" author="schwerin" created="Thu, 14 Mar 2013 16:56:21 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=michael%40songza.com&quot; class=&quot;user-hover&quot; rel=&quot;michael@songza.com&quot;&gt;michael@songza.com&lt;/a&gt;, this is a straight up performance issue having to do with how the indexing system extracts key data from documents during updates, inserts, etc.  Improving the performance while maintaining existing behavior will be tricky, and because it involves durable data, will require extensive testing.  I&apos;m moving this issue into triage, but in the short term I would recommend altering your schema when arrays might be very large.&lt;/p&gt;</comment>
                            <comment id="288330" author="schwerin" created="Wed, 13 Mar 2013 18:10:42 +0000"  >&lt;p&gt;I repeated approximately &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=michael%40songza.com&quot; class=&quot;user-hover&quot; rel=&quot;michael@songza.com&quot;&gt;michael@songza.com&lt;/a&gt;&apos;s experiment, and I used a build of mongod with the CPU profiler support enabled.  Attached are SVG renderings of hot call paths.  I tried to tune the two cases so that both took similar amounts of wall clock time, so the integer case has many more iterations in its profile than the object case.&lt;/p&gt;

&lt;p&gt;Notice that in the &quot;insert object&quot; case 90% of the time is spent in BSONObj::getField and its callees.  My judgement is that a lot of time is going into string comparisons and reparsing data from the raw bson representation.&lt;/p&gt;</comment>
                            <comment id="288246" author="michael@songza.com" created="Wed, 13 Mar 2013 16:49:20 +0000"  >&lt;p&gt;I can understand both the cost of a document move as well as the cost of the initial indexing. The surprising thing here though, and I apologize for probably not being as clear as I could have about this, is that the cost of indexing a list of objects is so much greater than the cost of doing the same with a simple list of integers. Note the numbers below:&lt;/p&gt;

&lt;p&gt;&amp;gt; ./test-indexes.py --object&lt;br/&gt;
Inserted 5000 in 1.005717 seconds&lt;br/&gt;
Updated ($set) 5000 in 1.799109 seconds&lt;br/&gt;
Updated/addone ($set) 5000 in 3.646754 seconds&lt;br/&gt;
Updated ($push) 5000 in 3.605636 seconds&lt;/p&gt;

&lt;p&gt;&amp;gt; ./test-indexes.py --integer&lt;br/&gt;
Inserted 5000 in 0.059405 seconds&lt;br/&gt;
Updated ($set) 5000 in 0.059590 seconds&lt;br/&gt;
Updated/addone ($set) 5000 in 0.111611 seconds&lt;br/&gt;
Updated ($push) 5000 in 0.108521 seconds&lt;/p&gt;

&lt;p&gt;Those numbers suggest that it&apos;s a little over 30 times faster to index a list of simple types than it is to index a list of complex types. While I probably would not be willing to have a 3.6 second lock in the database in a production setup, it might be acceptable in our use case for a 0.1 second lock on this particular dataset. Can you think of any reason for the large discrepancy between these two types of list items?&lt;/p&gt;</comment>
                            <comment id="287591" author="bryan.reinero@10gen.com" created="Tue, 12 Mar 2013 21:18:04 +0000"  >&lt;p&gt; Hi Michael,&lt;/p&gt;

&lt;p&gt;There are a couple of interesting mechanisms in play here:&lt;/p&gt;

&lt;p&gt;The composite key declared is a multi-key index built on an array. Multi-key indexes contain an entry in the index b-tree for every element in the array. If you are indexing arrays that have many elements, the resultant index will be large. Now, since MongoDB keeps indexes consistent with the data they cover, any insertions and any subsequent updates to array means that the index will need to be updated too. The impact of updates will be proportional to the number of elements per document.&lt;/p&gt;

&lt;p&gt;The effect is compounded when the indexed array is subject to growth. Documents inserted into MongoDB are stored adjacent to one another in an effort to maximize disk storage. However, if an update increases the size of the updated document, that document won&apos;t fit in its original space. The document will need to move on disk. Since MongoDB&apos;s index nodes contain the location of the document referenced, any move of document on disk means that the index must be updated. In the course of the &quot;$push&quot; test, the original document has 1000 elements with corresponding positions in the index b-tree, so the consequence of the $push of the additional 1000 elements means that the original document will move on disk, necessitating an update of the existing index nodes as well as the insertion of new nodes to cover the 1000 new elements.&lt;/p&gt;

&lt;p&gt;Of course, multi-key indexes are a useful feature, but must be used with consideration to the size and maintenance costs they&apos;ll incur. Indexing strategies which include arrays subject to unbounded growth have the potential to grow to a very large size for any collection of significant size, and increase latency. This is why schema design is such an important aspect of using MongoDB efficiently. Large unbounded indexed arrays can be avoided by designing your schema with a more normalized strategy.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="62184">SERVER-8192</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="24991" name="integer.svg" size="126074" author="schwerin@mongodb.com" created="Wed, 13 Mar 2013 18:10:42 +0000"/>
                            <attachment id="24990" name="object.svg" size="70111" author="schwerin@mongodb.com" created="Wed, 13 Mar 2013 18:10:42 +0000"/>
                            <attachment id="24940" name="test-indexes.py" size="1520" author="michael@songza.com" created="Tue, 12 Mar 2013 16:37:35 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 12 Mar 2013 21:18:04 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        8 years, 10 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            8 years, 10 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>schwerin@mongodb.com</customfieldvalue>
            <customfieldvalue>bryan.reinero</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>michael@songza.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrn1uf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrfpnj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3961</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;To run the test case using objects in the array:&lt;br/&gt;
  python test-indexes.py --object&lt;/p&gt;

&lt;p&gt;To run the test case using integers in the array:&lt;br/&gt;
  python test-indexes.py --integer&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlkun:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>