<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:46:26 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-37590] Investigate performance implications of using WiredTiger read-once cursors</title>
                <link>https://jira.mongodb.org/browse/SERVER-37590</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Using &lt;tt&gt;read_once=true&lt;/tt&gt; cursors has shown to provide no performance benefits yet, at least in the context of background index builds. Investigate why that is the case, and decide when they should or should not be used.&lt;/p&gt;</description>
                <environment></environment>
        <key id="617395">SERVER-37590</key>
            <summary>Investigate performance implications of using WiredTiger read-once cursors</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="louis.williams@mongodb.com">Louis Williams</assignee>
                                    <reporter username="louis.williams@mongodb.com">Louis Williams</reporter>
                        <labels>
                            <label>nyc</label>
                    </labels>
                <created>Fri, 12 Oct 2018 13:45:15 +0000</created>
                <updated>Wed, 8 Apr 2020 16:10:50 +0000</updated>
                            <resolved>Wed, 7 Nov 2018 14:58:43 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="2054483" author="tess.avitabile" created="Wed, 7 Nov 2018 15:03:50 +0000"  >&lt;p&gt;Thanks, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=louis.williams&quot; class=&quot;user-hover&quot; rel=&quot;louis.williams&quot;&gt;louis.williams&lt;/a&gt;! And thanks for the detailed explanation of your perf investigation.&lt;/p&gt;</comment>
                            <comment id="2054474" author="louis.williams" created="Wed, 7 Nov 2018 14:58:43 +0000"  >&lt;p&gt;Closing because we are going to proceed with&#160;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-37269&quot; title=&quot;Use read_once cursors to do foreground index build collection scans&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-37269&quot;&gt;&lt;del&gt;SERVER-37269&lt;/del&gt;&lt;/a&gt;&#160;to use read_once cursors for index builds.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=tess.avitabile&quot; class=&quot;user-hover&quot; rel=&quot;tess.avitabile&quot;&gt;tess.avitabile&lt;/a&gt;/&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.russotto&quot; class=&quot;user-hover&quot; rel=&quot;matthew.russotto&quot;&gt;matthew.russotto&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-36068&quot; title=&quot;Expose a user-accessible cursor option to avoid caching data from reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-36068&quot;&gt;&lt;del&gt;SERVER-36068&lt;/del&gt;&lt;/a&gt; should be unblocked now&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="2053752" author="louis.williams" created="Tue, 6 Nov 2018 20:34:22 +0000"  >&lt;p&gt;I&apos;ve done a significant amount of testing to attempt to demonstrate a change in index build speed and read throughput, but found that challenging for several reasons. Instead, what I will provide is an examination of the changes in cache behavior, which indicate that despite neither significant demonstrable improvements or regressions for those metrics, I believe in general this change will still be desirable.&lt;/p&gt;

&lt;p&gt;First, why was it hard to demonstrate a performance improvement/regression? These are some outstanding issues with my workload&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Reliability: over 5 runs of the same workload, the results can vary up to 10-15%. It is hard to make any conclusions about performance because the workload metrics are not particularly reliable.&lt;/li&gt;
	&lt;li&gt;Data: I used highly compressible test documents with 7:1 compression on average, which means the cost of a cache miss incurs a lower penalty. Less compressed data needs to be paged in.&lt;/li&gt;
	&lt;li&gt;Filesystem cache: it turns out the operating system is really good at caching data. This means the penalty of cache a miss is extremely low.&#160;The test was previously never actually going to disk. I ran workloads by clearing the filesystem cache before each run, which showed some promising improvements in read throughput, but because of the other two reasons listed, nothing that should be used to draw conclusions.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;&lt;a name=&quot;Procedure&quot;&gt;&lt;/a&gt;Procedure&lt;/h3&gt;

&lt;p&gt;I modified my test procedure as follows:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Load data: Each collection&apos;s size is 75% of the total WT cache size, 2GB&lt;/li&gt;
	&lt;li&gt;Shutdown mongod then clear caches: &lt;tt&gt;sync &amp;amp;&amp;amp; echo 3 | tee /proc/sys/vm/drop_caches&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;Do a collection scan on the hot collection to load it into the WT cache&lt;/li&gt;
	&lt;li&gt;Print &lt;tt&gt;collection.stats().wiredTiger.cache&lt;/tt&gt; for both collections&lt;/li&gt;
	&lt;li&gt;Perform an index build on the &quot;cold&quot; collection while performing randomized reads on the &quot;hot&quot; collection&lt;/li&gt;
	&lt;li&gt;Print &lt;tt&gt;collection.stats().wiredTiger.cache&lt;/tt&gt; for both collections again&lt;/li&gt;
	&lt;li&gt;&lt;em&gt;Time a collection scan on the &quot;hot&quot; collection, which should provide an indication of how much data is still in cache.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;&lt;a name=&quot;Results&quot;&gt;&lt;/a&gt;Results&lt;/h3&gt;

&lt;p&gt;Each block of 3 columns shows data for read-once True, False, and the difference between the measurements, respectively. Each row shows, for each collection, how much data was in cache before, after, and the difference between the two measurements.&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;&quot;bytes currently in cache&quot;&lt;/b&gt;&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: FALSE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: TRUE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;diff: read-once TRUE - FALSE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1,780,855,108&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;104,107&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1,780,855,108&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;104,107&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;765,809,035&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;943,253,031&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1,692,619,364&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;11,658,137&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;926,810,329&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-931,594,894&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;difference&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;-1,015,046,073&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;943,148,924&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;difference&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;-88,235,744&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;11,554,030&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;difference&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;926,810,329&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;-931,594,894&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;This can be interpreted as follows:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;The &quot;hot&quot; collection had 927MB more data in cache&#160;with read-once: true than with read-once: false&lt;/li&gt;
	&lt;li&gt;The &quot;cold&quot; collection had 932MB less data in cache with read-once: true than with read-once: false&#160;&lt;/li&gt;
	&lt;li&gt;This shows there is more &quot;hot&quot; data available in the WT cache.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;b&gt;&quot;unmodified pages evicted&quot;&lt;/b&gt;&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: FALSE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: TRUE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;diff: read-once TRUE - FALSE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;hot&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;cold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;34,635&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;27,460&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;3,377&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;61,215&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-31,258&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;33,755&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;diff&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;34,635&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;27,460&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;diff&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;3,377&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;61,215&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;diff&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-31,258&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;33,755&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;This can be interpreted as follows:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;The &quot;hot&quot; collection had 31k fewer cache evictions with read-once: true than with read-once: false&lt;/li&gt;
	&lt;li&gt;The &quot;cold&quot; collection had 34k more cache evictions with read-once: true than with read-once: false&#160;&lt;/li&gt;
	&lt;li&gt;This shows there is more &quot;hot&quot; data being kept in cache, though there were more cache evictions overall&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Additionally, I mentioned that I ran a single collection scan on the &quot;hot&quot; collection after each test. The results showed with very low variance over 5 trials that collection scans were in fact faster because more &quot;hot&quot; data was already in cache.&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: FALSE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;collection scan time (ms)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;read-once: TRUE&lt;/b&gt;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;collection scan time (ms)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&lt;b&gt;% change&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;avg&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;6325.6&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;avg&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5996.6&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-5.20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;min&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;6229&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;min&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5916&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-5.02%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;stddev&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;67.88&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;stddev&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;54.34&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;ul&gt;
	&lt;li&gt;This 5.2% decrease in scan time is actually significant&lt;/li&gt;
	&lt;li&gt;This shows that even when WT cache misses for this collection scan go entirely to filesystem cache and never disk, there is still a measurable performance improvement.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;&lt;a name=&quot;Conclusion&quot;&gt;&lt;/a&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Given this data, I think it is safe to conclude that read-once cursors do provide &lt;b&gt;some&lt;/b&gt;&#160;measurable benefit to reduce cache thrashing, and do not appear to provide significant disadvantages. The data implies they behave exactly as we would expect, so my belief is that we should plan on using read-once cursors by default where believe they may be beneficial.&lt;/p&gt;</comment>
                            <comment id="2051017" author="daniel.gottlieb@10gen.com" created="Sat, 3 Nov 2018 17:44:06 +0000"  >&lt;p&gt;I also ran &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=louis.williams&quot; class=&quot;user-hover&quot; rel=&quot;louis.williams&quot;&gt;louis.williams&lt;/a&gt;&apos; linked patch and noticed there are no read requests going to disk during the run. I believe the OS filesystem cache is effectively keeping everything in memory.&lt;/p&gt;</comment>
                            <comment id="2051006" author="bruce.lucas@10gen.com" created="Sat, 3 Nov 2018 16:27:13 +0000"  >&lt;p&gt;A couple of other things to consider:&lt;/p&gt;

&lt;p&gt;If the hot collection fits entirely in the o/s filesystem cache, then the cost of a WT cache miss may not be that high because it doesn&apos;t require i/o. Do you see a large difference in read throughput comparing no index build with index build?&lt;/p&gt;

&lt;p&gt;Maybe the high rate of activity on the hot collection keeps it in cache whereas the cold collection gets evicted preferentially because it&apos;s only read once. You might see a larger difference if you stop the reads, do the index build, then start the reads again - initially the rate should be lower while the hot collection fills the cache again.&lt;/p&gt;</comment>
                            <comment id="2050490" author="dan@10gen.com" created="Fri, 2 Nov 2018 18:01:34 +0000"  >&lt;p&gt;Have you examined the ftdc data to confirm that the collection-specific cache stats on the cold table are behaving as you expect?  Bruce or Kelsey can show you how to gather those stats and feed them into t2.&lt;/p&gt;</comment>
                            <comment id="2050481" author="louis.williams" created="Fri, 2 Nov 2018 17:52:48 +0000"  >&lt;p&gt;I&apos;ll summarize here some tests that show insignificant results of using read_once cursors, at least for foreground index builds, which is similar to how future hybrid indexes will behave.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/louiswilliams/mongo/commit/15d1f23c9f2f950ad0c7e17d5489756e9deaea8e&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;I made a test&lt;/a&gt; that builds a foreground index on one &quot;cold&quot; collection and performs random point-lookups on another &quot;hot&quot; collection. The WT cache size was fixed at 1GB, and each collection was filled with docs of size 4k, and as many documents to fill 90% of the cache. I measured the total index build time (ms) and total ops/s across all client threads in that period of time.&lt;/p&gt;

&lt;p&gt;I ran 5 trials each of each read_once=true and read_once=false:&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;readOnce?&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;clientThreads&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;docs as % of cache size&#160;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;index build time (stddev)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;index build time (avg)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#160;% change&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;read ops/s (stddev)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;read ops/s (avg)&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;% change&#160;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;FALSE&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;16&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;90%&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;2,065.14&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;35260.00&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#8211;&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;561.46&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;8506.514286&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&#8211;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;TRUE&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;16&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;90%&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;4,263.36&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;36954.00&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;4.80%&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;742.81&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;8401.594595&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;-1.23%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;I analyzed the t2 and can confirm there is less cache pressure in general, at least cases where the cache exceeds 80%, but the results do not show anything significant.&#160;&lt;/p&gt;

&lt;p&gt;I am open to ideas for a testing configuration that may better demonstrate improvements, but at least in the scenario we would expect to see improvement, there is none.&lt;/p&gt;</comment>
                            <comment id="2049566" author="tess.avitabile" created="Thu, 1 Nov 2018 21:54:31 +0000"  >&lt;p&gt;Thank you for letting me know. I am not too concerned about bitrot for &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-36068&quot; title=&quot;Expose a user-accessible cursor option to avoid caching data from reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-36068&quot;&gt;&lt;del&gt;SERVER-36068&lt;/del&gt;&lt;/a&gt;. We&apos;re going to close out the Faster Initial Sync project soon, so we will remove &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-36068&quot; title=&quot;Expose a user-accessible cursor option to avoid caching data from reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-36068&quot;&gt;&lt;del&gt;SERVER-36068&lt;/del&gt;&lt;/a&gt; from the project.&lt;/p&gt;</comment>
                            <comment id="2049501" author="milkie" created="Thu, 1 Nov 2018 21:14:15 +0000"  >&lt;p&gt;I don&apos;t expect any further movement on this issue for another two sprints at least.  We could expedite it if you feel that the work already staged for &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-36068&quot; title=&quot;Expose a user-accessible cursor option to avoid caching data from reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-36068&quot;&gt;&lt;del&gt;SERVER-36068&lt;/del&gt;&lt;/a&gt; will bitrot.&lt;/p&gt;</comment>
                            <comment id="2048812" author="tess.avitabile" created="Thu, 1 Nov 2018 13:55:12 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt;, when do you expect the Storage team to look into this issue? I&apos;m interested in when we should plan to schedule &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-36068&quot; title=&quot;Expose a user-accessible cursor option to avoid caching data from reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-36068&quot;&gt;&lt;del&gt;SERVER-36068&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="608276">SERVER-37269</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="570270">SERVER-36068</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1290761">SERVER-47118</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 1 Nov 2018 13:55:12 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 14 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>suganthi.mani@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 14 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>bruce.lucas@mongodb.com</customfieldvalue>
            <customfieldvalue>daniel.gottlieb@mongodb.com</customfieldvalue>
            <customfieldvalue>dan@mongodb.com</customfieldvalue>
            <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>louis.williams@mongodb.com</customfieldvalue>
            <customfieldvalue>tess.avitabile@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hua2vr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr8gcv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="2595">Storage NYC 2018-11-05</customfieldvalue>
    <customfieldvalue id="2596">Storage NYC 2018-11-19</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hu9p53:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>