<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:26:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-51650] Primary-Only Service&apos;s _rebuildCV should be notified even if stepdown happens quickly after stepup</title>
                <link>https://jira.mongodb.org/browse/SERVER-51650</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;When stepup completes,&#160;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L240-L242&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;each&lt;/a&gt; Primary-Only Service&apos;s _state is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L365&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;set to kRebuilding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Both&#160;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L519-L520&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PrimaryOnlyService::lookupInstance&lt;/a&gt; and&#160;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L486-L487&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PrimaryOnlyService::getOrCreateInstance&lt;/a&gt; wait until the _rebuildCV condition variable is notified and the _state is no longer kRebuilding.&lt;/p&gt;

&lt;p&gt;_rebuildCV is notified in &lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L550&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PrimaryOnlyService::_rebuildInstances&lt;/a&gt;, which on stepup is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L390&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;scheduled to run asynchronously&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If stepdown occurs before _rebuildInstances starts, e.g. if stepdown occurs &lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L388&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;, then _rebuildCV may never be notified. So, any threads blocking in lookupInstance or getOrCreateInstance that don&apos;t get interrupted by stepdown will block indefinitely.&lt;/p&gt;

&lt;p&gt;Currently, there is an&#160;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L513-L516&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;invariant&lt;/a&gt; in lookupInstance that the thread is guaranteed to be interrupted by stepdown. Otherwise, if the thread is holding the RSTL lock, the thread would prevent the stepdown from completing, leading to a deadlock.&lt;/p&gt;

&lt;p&gt;It would be better to notify _rebuildCV &lt;a href=&quot;https://github.com/mongodb/mongo/blob/35d7e75bca7cae7bfc984db0dbc1a5099821ccc4/src/mongo/db/repl/primary_only_service.cpp#L391&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt; to guarantee threads cannot block indefinitely in lookup or getOrCreateInstance.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Acceptance criteria:&lt;/b&gt;&#160;&lt;/p&gt;

&lt;p&gt;Reproduce issue in unit test&lt;br/&gt;
Fix as suggested&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1515354">SERVER-51650</key>
            <summary>Primary-Only Service&apos;s _rebuildCV should be notified even if stepdown happens quickly after stepup</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="wenbin.zhu@mongodb.com">Wenbin Zhu</assignee>
                                    <reporter username="esha.maharishi@mongodb.com">Esha Maharishi</reporter>
                        <labels>
                            <label>lowcontext</label>
                            <label>servicearch-wfbf-day</label>
                    </labels>
                <created>Thu, 15 Oct 2020 12:49:04 +0000</created>
                <updated>Sun, 29 Oct 2023 22:01:54 +0000</updated>
                            <resolved>Wed, 7 Jun 2023 15:39:29 +0000</resolved>
                                                    <fixVersion>7.1.0-rc0</fixVersion>
                                                        <votes>0</votes>
                                    <watches>10</watches>
                                                                                                                <comments>
                            <comment id="5480101" author="xgen-internal-githook" created="Wed, 7 Jun 2023 04:58:31 +0000"  >&lt;p&gt;Author: &lt;/p&gt;
{&apos;name&apos;: &apos;Wenbin Zhu&apos;, &apos;email&apos;: &apos;wenbin.zhu@mongodb.com&apos;, &apos;username&apos;: &apos;WenbinZhu&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-51650&quot; title=&quot;Primary-Only Service&amp;#39;s _rebuildCV should be notified even if stepdown happens quickly after stepup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-51650&quot;&gt;&lt;del&gt;SERVER-51650&lt;/del&gt;&lt;/a&gt; Ensure PrimaryOnlyService state gets updated in case of rebuild failure.&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/f0e996cbf2cee788498ea04d2586cc1a851fbaf4&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/f0e996cbf2cee788498ea04d2586cc1a851fbaf4&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4773416" author="george.wangensteen" created="Wed, 24 Aug 2022 13:04:27 +0000"  >&lt;p&gt;The commit was reverted and therefore isn&apos;t present in the master or 6.1 branches. (The 6.1.0-rc0 in the fixVersion field cannot be cleared.)&lt;/p&gt;</comment>
                            <comment id="4758768" author="suganthi.mani" created="Wed, 17 Aug 2022 20:50:50 +0000"  >&lt;p&gt;Since the fix got reverted, isn&apos;t the we need to reopen this ticket and clear the fix version? It currently gives a wrong indication that a fix for this ticket is in 6.1.0-rc0.&lt;/p&gt;</comment>
                            <comment id="4723459" author="xgen-internal-githook" created="Tue, 2 Aug 2022 20:25:46 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;mathisbessamdb&apos;, &apos;email&apos;: &apos;mathis.bessa@mongodb.com&apos;, &apos;username&apos;: &apos;mathisbessamdb&apos;}
&lt;p&gt;Message: Revert &quot;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-51650&quot; title=&quot;Primary-Only Service&amp;#39;s _rebuildCV should be notified even if stepdown happens quickly after stepup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-51650&quot;&gt;&lt;del&gt;SERVER-51650&lt;/del&gt;&lt;/a&gt; Ensure failure to rebuild PrimaryOnlyService on step up results in state change&quot;&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/ea9f1be226f4611f62e8070bfc1ce9e5ef461b47&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/ea9f1be226f4611f62e8070bfc1ce9e5ef461b47&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4719822" author="JIRAUSER1262621" created="Mon, 1 Aug 2022 22:35:00 +0000"  >&lt;p&gt;We are reverting this due to a race condition that was introduced and related to the operation context. As mentioned above more details can be found in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-68438&quot; title=&quot;Fix PrimaryOnlyService race condition with the PrimaryOnlyServiceClientObserver&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-68438&quot;&gt;&lt;del&gt;SERVER-68438&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="4719505" author="george.wangensteen" created="Mon, 1 Aug 2022 20:32:17 +0000"  >&lt;p&gt;Reverting due to this making the bug documented in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-68438&quot; title=&quot;Fix PrimaryOnlyService race condition with the PrimaryOnlyServiceClientObserver&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-68438&quot;&gt;&lt;del&gt;SERVER-68438&lt;/del&gt;&lt;/a&gt; occur more frequently in the waterfall.&lt;/p&gt;</comment>
                            <comment id="4677436" author="xgen-internal-githook" created="Wed, 13 Jul 2022 18:31:24 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;George Wangensteen&apos;, &apos;email&apos;: &apos;george.wangensteen@mongodb.com&apos;, &apos;username&apos;: &apos;gewa24&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-51650&quot; title=&quot;Primary-Only Service&amp;#39;s _rebuildCV should be notified even if stepdown happens quickly after stepup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-51650&quot;&gt;&lt;del&gt;SERVER-51650&lt;/del&gt;&lt;/a&gt; Ensure failure to rebuild PrimaryOnlyService on step up results in state change&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/11ed931625c685b453a5244553f1d97c81b80850&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/11ed931625c685b453a5244553f1d97c81b80850&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4619303" author="esha.maharishi@10gen.com" created="Wed, 15 Jun 2022 22:34:52 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt; makes sense, thanks.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=george.wangensteen%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;george.wangensteen@mongodb.com&quot;&gt;george.wangensteen@mongodb.com&lt;/a&gt; , ah I see, good point. So the continuations might run even if the error was due to stepdown. Maybe we could check if the state is still kRebuilding and only if so call _setState(kRebuildFailed)?&lt;/p&gt;</comment>
                            <comment id="4618564" author="george.wangensteen" created="Wed, 15 Jun 2022 18:19:31 +0000"  >&lt;p&gt;Ah, thanks for that explanation &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=esha.maharishi%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;esha.maharishi@mongodb.com&quot;&gt;esha.maharishi@mongodb.com&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;I think the race might still be possible, though, because onStepDown (and therefore interruptInstances, and therefore the shutdown of &lt;em&gt;scopedExecutor) will be called after the node transitioned out of Primary, whereas the future returned by waitUntilMajority could be set with an error anytime after the WaitForMajorityService&apos;s &lt;a href=&quot;https://github.com/mongodb/mongo/blob/d762bb7bc5e99c387fe16468c562132de24c5a45/src/mongo/db/repl/wait_for_majority_service.cpp#L203&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;call to waitForWriteConcern&lt;/a&gt; fails. In particular, the POS is informed of the stepDown by the ReplicaSetAwareService infrastructure in &lt;a href=&quot;#L4626]&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;ReplicationCoordinatorImpl::_performPostMemberStateUpdateAction&lt;/a&gt; which is called in ReplicationCoordinatorImpl::stepdown after calls to awaitReplication are set with error &lt;a href=&quot;https://github.com/mongodb/mongo/blob/d762bb7bc5e99c387fe16468c562132de24c5a45/src/mongo/db/repl/replication_coordinator_impl.cpp#L4520-L4521&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt; via the _replicationWaiterList. So I think it&apos;s possible for the waitUntilMajority future to be set with an error (and therefore for continuations scheduled on it to run) before the POS::onStepDown logic runs and shuts down the executor. I definitely could be missing something from this analysis though as my knowledge of the ReplicationCoordinator is limited so let me know if something sounds off.&#160; To ensure there would be no race between the onStepDown/onStepUp threads, I think we&apos;d need to make sure that the POS::onStepDown logic runs _before&lt;/em&gt; it is possible for the waitUntilMajority future to be set with an error due to step down.&lt;/p&gt;</comment>
                            <comment id="4616131" author="max.hirschhorn@10gen.com" created="Tue, 14 Jun 2022 22:36:33 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Max Hirschhorn, are there cases you know of where waitUntilMajority would return an error besides stepdown/shutdown?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There weren&apos;t other cases from what I had peeked at in the wait&amp;#95;for&amp;#95;majority&amp;#95;service.cpp implementation.&lt;/p&gt;</comment>
                            <comment id="4611977" author="esha.maharishi@10gen.com" created="Mon, 13 Jun 2022 17:47:06 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt;, are there cases you know of where waitUntilMajority would return an error besides stepdown/shutdown?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=george.wangensteen%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;george.wangensteen@mongodb.com&quot;&gt;george.wangensteen@mongodb.com&lt;/a&gt;, are you sure there would be a race between the stepdown/shutdown thread and the onStepUp thread? The onStepUp thread&apos;s continuations &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e878b08539f91ffddfae9692a57e9ea574f16bcb/src/mongo/db/repl/primary_only_service.cpp#L378-L404&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;run on _newScopedExecutor&lt;/a&gt;, which &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e878b08539f91ffddfae9692a57e9ea574f16bcb/src/mongo/db/repl/primary_only_service.cpp#L349&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;is a reference to _scopedExecutor&lt;/a&gt;, and the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e878b08539f91ffddfae9692a57e9ea574f16bcb/src/mongo/db/repl/primary_only_service.cpp#L411-L413&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;stepdown/shutdown thread would shut down _scopedExecutor&lt;/a&gt;. So, the continuations wouldn&apos;t run on stepdown/shutdown. The continuations would only set the state to kRebuildFailed and notify _stateChangedCV if waitUntilMajority returned an error &lt;b&gt;besides&lt;/b&gt;&#160;stepdown/shutdown.&lt;/p&gt;</comment>
                            <comment id="4611442" author="george.wangensteen" created="Mon, 13 Jun 2022 15:41:52 +0000"  >&lt;p&gt;I looked into this and chatted with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt;&#160; and &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=janna.golden%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;janna.golden@mongodb.com&quot;&gt;janna.golden@mongodb.com&lt;/a&gt; about it a bit. I think this is no longer a bug, but the situation is has become a bit convoluted to reason about. Essentially:&lt;/p&gt;

&lt;p&gt;(1) If the continuations chained with .then() after the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e878b08539f91ffddfae9692a57e9ea574f16bcb/src/mongo/db/repl/primary_only_service.cpp#L377&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;call to waitUntilMajority&lt;/a&gt; do not run, then either:&lt;/p&gt;

&lt;p&gt;&#160;&#160;&#160; (1a) the associated cancellation source _source was cancelled&lt;/p&gt;

&lt;p&gt;&#160;&#160;&#160; (1b) the newScopedExecutor was shutdown&lt;/p&gt;

&lt;p&gt;&#160;&#160;&#160; (1c) The waitUntilMajorityService was shutdown&lt;/p&gt;

&lt;p&gt;&#160;&#160;&#160; (1d) &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e878b08539f91ffddfae9692a57e9ea574f16bcb/src/mongo/db/write_concern.h#L106&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;waitForWriteConcern&lt;/a&gt; returned an error status.&lt;/p&gt;

&lt;p&gt;(1a) and (1b) can happen only if the primary only service onStepDown or shutdown have been called - and both of these functions currently change the state from _kRebuilding and wake the CV.&lt;/p&gt;

&lt;p&gt;(1c) can happen only on process shutdown, and occurs after the replication coordinator steps/shuts down, which triggers POS stepdown/shutdown logic, so the same argument from above applies&lt;/p&gt;

&lt;p&gt;(1d) can occur only if there is a stepdown while waiting for replication (in which case the argument from 1a/1b) applies, or if there is an error in the specified write concern, which should not be the case as it derives from the WFMS.&lt;/p&gt;

&lt;p&gt;So in all cases, _state should be set from kRebuilding and the CV should be notified. If the continuations chained do run, then either the node is still primary and rebuilding continues, or the term has changed and therefore the node has stepped down, again allowing us to apply the argument from (1a)/(1b). So I don&apos;t see a case where the _stateChangedCV is not notified even if step up happens quickly after step up.&lt;/p&gt;

&lt;p&gt;It is difficult to add some deterministic assertions to verify this though, because there is a race between the stepdown/shutdown thread setting the state and waking the CV and the onStepUp thread waking from waitUntilMajority failing and setting the state to kRebuildFailed. This race should be harmless from a correctness perspective but makes it difficult to make the state-change ordering deterministic. Note that this &lt;a href=&quot;https://github.com/mongodb/mongo/blob/e1546a4cf57fc6104a60725839e13efc9e1e3a4d/src/mongo/db/repl/primary_only_service_test.cpp#L1112&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;test&lt;/a&gt; addresses the bug concern described in the ticket, because it ensures that the thread calling lookupInstances() is woken by the stepdown thread even when stepdown occurs quickly after stepup/the onStepUp async future chain does not succeed.&#160;&lt;/p&gt;

&lt;p&gt;I think we have a few options:&lt;/p&gt;

&lt;p&gt;-&amp;gt; Add some comments to the code explaining the above and all legal _state transitions, but leave things as they are. File a ticket to try and simplify state management.&lt;/p&gt;

&lt;p&gt;-&amp;gt; Make the end of the onStepUp future chain set _state to kRebuildFailed for the purposes of clarity when reading - as mentioned above this introduces a race where the stepUp thread can set state to kREbuildFailed after stepdown sets it to kPaused, but this race should be harmless&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=esha.maharishi%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;esha.maharishi@mongodb.com&quot;&gt;esha.maharishi@mongodb.com&lt;/a&gt; Do you have any thoughts on what might be helpful here or if this analysis is missing something/wrong? Thanks!&lt;/p&gt;</comment>
                            <comment id="4480192" author="esha.maharishi@10gen.com" created="Wed, 13 Apr 2022 14:25:41 +0000"  >&lt;p&gt;Thanks &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt; for pointing out that while &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-61717&quot; title=&quot;Ensure a POS instance remains in the POS map until the instance&amp;#39;s run() is complete&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-61717&quot;&gt;SERVER-61717&lt;/a&gt; will remove one reason to call opCtx-&amp;gt;setAlwaysInterruptAtStepDownOrUp (caller can be waiting on an instance that is removed from the map), this ticket represents another reason to call opCtx-&amp;gt;setAlwaysInterruptAtStepDownOrUp (caller is waiting on _rebuildCV) that would still need to be addressed.&lt;/p&gt;</comment>
                            <comment id="4480034" author="esha.maharishi@10gen.com" created="Wed, 13 Apr 2022 13:45:21 +0000"  >&lt;p&gt;Just a note that &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-61717&quot; title=&quot;Ensure a POS instance remains in the POS map until the instance&amp;#39;s run() is complete&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-61717&quot;&gt;SERVER-61717&lt;/a&gt; will remove the need to call opCtx-&amp;gt;setAlwaysInterruptAtStepDownOrUp. It isn&apos;t scheduled, but is a high priority item on the Serverless backlog.&lt;/p&gt;</comment>
                            <comment id="4474929" author="max.hirschhorn@10gen.com" created="Mon, 11 Apr 2022 21:46:51 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=blake.oler%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;blake.oler@mongodb.com&quot;&gt;blake.oler@mongodb.com&lt;/a&gt;, I believe this issue has been worked around in practice by having callers do opCtx&amp;#45;&amp;gt;setAlwaysInterruptAtStepDownOrUp() before calling PrimaryOnlyService::lookupInstance() or PrimaryOnlyService::getOrCreateInstance() because they cannot rely on &amp;#95;waitForStateNotRebuilding() to eventually be satisfied. My impression has been we&apos;re looking to remove the pattern of opCtx&amp;#45;&amp;gt;setAlwaysInterruptAtStepDownOrUp() from our C++ codebase and fixing PrimaryOnlyService&apos;s state transition lifecycle is one required piece of that.&lt;/p&gt;</comment>
                            <comment id="4473830" author="blake.oler" created="Mon, 11 Apr 2022 18:11:27 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt; has this been seen in the wild or in Evergreen? Also curious about relative impact of the issue for prioritization.  &lt;/p&gt;</comment>
                            <comment id="4466249" author="max.hirschhorn@10gen.com" created="Thu, 7 Apr 2022 23:33:11 +0000"  >&lt;p&gt;Reopening this ticket because &lt;a href=&quot;https://github.com/mongodb/mongo/blob/be3948861204ef3a347538ad2acf1fe33d843e7e/src/mongo/db/repl/primary_only_service.cpp#L376&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;PrimaryOnlyService::&amp;#95;state is still not being set to PrimaryOnlyService::State::kRebuildFailed when WaitForMajorityService::waitUntilMajority() returns an error future&lt;/a&gt;. I believe a similar case applies if &lt;tt&gt;newScopedExecutor&lt;/tt&gt; is shut down.&lt;/p&gt;

&lt;p&gt;Note that while &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-62682&quot; title=&quot;PrimaryOnlyService Does Not Call _rebuildCV.notify_all() leading to calls to waitForConditionOrInterrupt not being triggered&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-62682&quot;&gt;&lt;del&gt;SERVER-62682&lt;/del&gt;&lt;/a&gt; addressed cases where PrimaryOnlyService::&amp;#95;stateChangeCV wasn&apos;t being notified, it didn&apos;t improve the situation described in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-51650&quot; title=&quot;Primary-Only Service&amp;#39;s _rebuildCV should be notified even if stepdown happens quickly after stepup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-51650&quot;&gt;&lt;del&gt;SERVER-51650&lt;/del&gt;&lt;/a&gt;. The future chain must guarantee PrimaryOnlyService::&amp;#95;state != PrimaryOnlyService::State::kRebuilding at the end (i.e. on success or failure) and ought to additionally use PrimaryOnlyService::&amp;#95;setState() to ensure it wakes up any waiters.&lt;/p&gt;</comment>
                            <comment id="4375726" author="JIRAUSER1262719" created="Thu, 24 Feb 2022 21:56:45 +0000"  >&lt;p&gt;We haven&#8217;t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10520">
                    <name>Problem/Incident</name>
                                            <outwardlinks description="causes">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="1964638">SERVER-62682</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2022709">SERVER-65469</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2102786">SERVER-68438</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2104105">SERVER-68476</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2043397">SERVER-66351</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1932481">SERVER-61717</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1477562">SERVER-50982</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1512801">SERVER-51518</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>18.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25132"><![CDATA[Service Arch]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="24444"><![CDATA[v6.1]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 24 Feb 2022 21:56:45 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        35 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16941"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            35 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>131.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>blake.oler@mongodb.com</customfieldvalue>
            <customfieldvalue>esha.maharishi@mongodb.com</customfieldvalue>
            <customfieldvalue>george.wangensteen@mongodb.com</customfieldvalue>
            <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>lauren.lewis@mongodb.com</customfieldvalue>
            <customfieldvalue>mathis.bessa@mongodb.com</customfieldvalue>
            <customfieldvalue>max.hirschhorn@mongodb.com</customfieldvalue>
            <customfieldvalue>suganthi.mani@mongodb.com</customfieldvalue>
            <customfieldvalue>wenbin.zhu@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hycuif:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr1gua:i</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_22250" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Special Downgrade Instructions Required</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="23343"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="6263">Service Arch 2022-05-30</customfieldvalue>
    <customfieldvalue id="6267">Service Arch 2022-06-13</customfieldvalue>
    <customfieldvalue id="6283">Service Arch 2022-06-27</customfieldvalue>
    <customfieldvalue id="6303">Service Arch 2022-07-11</customfieldvalue>
    <customfieldvalue id="6326">Service Arch 2022-07-25</customfieldvalue>
    <customfieldvalue id="6750">Service Arch 2023-02-20</customfieldvalue>
    <customfieldvalue id="6751">Service Arch 2023-03-06</customfieldvalue>
    <customfieldvalue id="7348">Service Arch 2023-05-01</customfieldvalue>
    <customfieldvalue id="7349">Service Arch 2023-05-15</customfieldvalue>
    <customfieldvalue id="7350">Service Arch 2023-05-29</customfieldvalue>
    <customfieldvalue id="7351">Service Arch 2023-06-12</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10555" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Story Points</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hycgrr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>