<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:03:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-43417] Signal the flusher thread to flush instead of calling waitUntilDurable when waiting for {j:true}</title>
                <link>https://jira.mongodb.org/browse/SERVER-43417</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/537168b589a6c7d81163b67515d9adea52ff26e8/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L275-L281&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;_lastSyncMutex&lt;/a&gt; is one of the most contended mutexes when running with {j: true}. This optimization reduces the contention on _lastSyncMutex and avoid serializing {j: true} writer for journal waiting. This also reduces sync-related I/O rate when there are lots of {j: true} writer but will also avoid significant delay in journal waiting.&lt;/p&gt;</description>
                <environment></environment>
        <key id="935346">SERVER-43417</key>
            <summary>Signal the flusher thread to flush instead of calling waitUntilDurable when waiting for {j:true}</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="backlog-server-repl">Backlog - Replication Team</assignee>
                                    <reporter username="lingzhi.deng@mongodb.com">Lingzhi Deng</reporter>
                        <labels>
                    </labels>
                <created>Mon, 23 Sep 2019 15:09:13 +0000</created>
                <updated>Tue, 6 Dec 2022 02:47:39 +0000</updated>
                            <resolved>Wed, 5 Feb 2020 18:55:27 +0000</resolved>
                                                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>12</watches>
                                                                                                                <comments>
                            <comment id="2785457" author="dianna.hohensee" created="Wed, 5 Feb 2020 18:55:28 +0000"  >&lt;p&gt;The work for this ticket has been done in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-45665&quot; title=&quot;Make JournalFlusher flush on command and watiForWriteConcern asynchronously call waitUntilDurable through the JournalFlusher &quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-45665&quot;&gt;&lt;del&gt;SERVER-45665&lt;/del&gt;&lt;/a&gt;. Closing.&lt;/p&gt;</comment>
                            <comment id="2447403" author="lingzhi.deng" created="Wed, 2 Oct 2019 14:43:57 +0000"  >&lt;blockquote&gt;&lt;p&gt;signal a flusher thread and wait for last durable opTime&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;After talking to &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=daniel.gottlieb&quot; class=&quot;user-hover&quot; rel=&quot;daniel.gottlieb&quot;&gt;daniel.gottlieb&lt;/a&gt;, we found that this part is problematic under today&apos;s implementation of lastDurable opTime. This is because lastDurable could include oplog holes. Imagine the case where two transactions, TXN1 and TXN2, with OpTime 1 and 2 respectively are in flight and TXN2 commits before TXN1 does. At this time, TXN2 will trigger flush and the flusher sets lastDurbale to OpTime 2 once it&#8217;s done. And now OpTime2 is durable/journaled but not OpTime1. And when TXN1 commits later, if we check if lastDurable &amp;gt; OpTime1, and it does, so it will mistakenly return without journaling OpTime1. So now if the server crashes and restarts, OpTime1 is gone. This is only a problem for {w: 1, j: true} because before PM-1274, journaling is implied for w &amp;gt; 1 as secondaries are not able see those oplog entries due to oplog visibility rules.&lt;/p&gt;

&lt;p&gt;The takeaway is that today, we can &lt;b&gt;not&lt;/b&gt; do direct comparisons with lastDurable to check whether an OpTime is already journaled even though the OpTime in question is &amp;lt; lastDurable (equality is fine I guess).&lt;/p&gt;

&lt;p&gt;After PM-1274, I believe lastDurable will no longer be ahead of allCommitted (&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=daniel.gottlieb&quot; class=&quot;user-hover&quot; rel=&quot;daniel.gottlieb&quot;&gt;daniel.gottlieb&lt;/a&gt; to confirm). So the aforementioned approach will still work. However, this introduce a behavior change / semantic change to {j: true}. If we pin lastDurable to allCommitted, {j: true} will have to wait for all concurrent transactions that have earlier OpTime to commit (i.e. no hole). Thus, this could potentially mean bigger latency for {j: true} writers.&lt;/p&gt;

&lt;p&gt;An alternative is to use a counter as an indicator of whether a log flush has happened since &quot;my&quot; request. Each log flush request (trigger) gets a number under lock for the next log flush, and the flusher makes a cutoff under lock before it actually flushes. It is like buying tickets for the next train and having a cutoff before the train leaves.&lt;/p&gt;

&lt;p&gt;Here is my &lt;a href=&quot;https://gist.github.com/ldennis/9f43713f1d7d3261af8883b3826b537a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;POC&lt;/a&gt; using the lastDurable with the future-based api introduced in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-43135&quot; title=&quot;Introduce a future-based API for waiting for write concern&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-43135&quot;&gt;&lt;del&gt;SERVER-43135&lt;/del&gt;&lt;/a&gt; (see wiredtiger_kv_engine.cpp and write_concern.cpp). And the performance gain for {w: majority} (which does not have the problem mentioned above):&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;{w: majority}&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;1&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;32&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;64&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;256&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;1024&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;before&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;272&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;3424&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5482&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;10529&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;10038&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;after&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;271&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;3513&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5901&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;12605&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;17737&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;I believe the performance gain came from reducing contention in &lt;a href=&quot;https://github.com/mongodb/mongo/blob/15c6c085126f5d459f30191ef736c10607bea3f6/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L280-L289&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;_lastSyncMutex&lt;/tt&gt; in &lt;tt&gt;waitUntilDurable&lt;/tt&gt;&lt;/a&gt;. Under high number of concurrent writers (with j: true), all writers will block on mutex even though some of them might find someone else has synced already after they are granted the mutex. I suspect there is some kinds of thundering herd problems going on here. Also note that, the window for two log flush requests to sync at once is fairly small (line 280 - line 289). If a waiter comes after the &lt;tt&gt;_lastSyncTime&lt;/tt&gt; is incremented, it will have to wait for its turn to flush again even though technically it is already synced by the previous flush.&lt;/p&gt;

&lt;p&gt;Here is some profiling data. We can see that the average time taken in &lt;tt&gt;WiredTigerSessionCache::waitUntilDurable&lt;/tt&gt; is significantly longer than the time it spends on &lt;tt&gt;__session_log_flush&lt;/tt&gt;. This suggests contention on the mutex.&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Func&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Min (us)&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Max (us)&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Avg (us)&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Count&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;Total (usec)&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;__session_log_flush&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;5&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;46198&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1598&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;11324&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;18099227&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;WiredTigerSessionCache::waitUntilDurable&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;13&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;148483&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;10974&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;357477&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;3923132716&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;To sum up, my high level idea is to have a single thread handle all flush requests and waiters wait for notifications (Open question: lastDurable? a counter? or ?).&lt;/p&gt;

&lt;p&gt;CC: &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt;, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=geert.bosch&quot; class=&quot;user-hover&quot; rel=&quot;geert.bosch&quot;&gt;geert.bosch&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2434954" author="lingzhi.deng" created="Thu, 26 Sep 2019 17:51:10 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-41392&quot; title=&quot;Modify the _oplogJournalThreadLoop() to no longer call waitUntilDurable() and instead update the oplogTruncateAfterPoint&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-41392&quot;&gt;&lt;del&gt;SERVER-41392&lt;/del&gt;&lt;/a&gt; and PM-1274 will be changing how the oplog manager works. But it would still be useful in the future to have the ability to signal a flusher thread and wait for last durable opTime. I am taking this out from PM-1456 and will re-visit this once PM-1274 is done.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="915296">SERVER-43135</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="1106823">SERVER-45665</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="942203">SERVER-43658</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25128"><![CDATA[Replication]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 5 Feb 2020 18:55:28 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 1 week ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-43135'>SERVER-43135</a></s>, <s><a href='https://jira.mongodb.org/browse/PM-1274'>PM-1274</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 1 week ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-repl</customfieldvalue>
            <customfieldvalue>dianna.hohensee@mongodb.com</customfieldvalue>
            <customfieldvalue>lingzhi.deng@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvrodj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr6sfz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvramv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>