<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 06:12:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-69028] Collect thread migrations in FTDC</title>
                <link>https://jira.mongodb.org/browse/SERVER-69028</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;h2&gt;&lt;a name=&quot;Background&quot;&gt;&lt;/a&gt;Background&lt;/h2&gt;

&lt;p&gt;Thread migrations happen when Kernel load balancer moves a scheduled process/thread from one runqueue to another when inefficiency of unbalanced &quot;load&quot; is higher than migration cost. High frequency of thread migrations indicate that the thread model is suboptimal, one or more cores are left idle while others have a runqueue of jobs ready to run. This may happen in at least two cases: high lock contention and/or improper use of thread pools (too many thread pools, thread pools too large or too small, etc).&lt;/p&gt;

&lt;h2&gt;&lt;a name=&quot;Motivation&quot;&gt;&lt;/a&gt;Motivation&lt;/h2&gt;

&lt;p&gt;We know our thread model is suboptimal. We have thread per request model with too many auxiliary thread pools. The approximate roadmap to fix that is:&lt;br/&gt;
1. Consolidate thread pools&lt;br/&gt;
2. Create a special thread pool for blocking calls (see my &lt;a href=&quot;https://docs.google.com/presentation/d/1E-xC3EeDcHkQPlHAVdPIeSNBkXANVm_NSEBdFzFJ9gc/edit?usp=sharing&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;presentation&lt;/a&gt; for details&lt;br/&gt;
3. Migrate to more asynchronous model&lt;br/&gt;
4. Design load-based admission control to reject requests that cannot be executed soon&lt;br/&gt;
5. Design proper token-bucket based user isolation&lt;/p&gt;

&lt;p&gt;For all of those tasks we need proper measurements. Benchmarking the code during development is time consuming and not necessary. Using thread migrations as a quick negative signal is easy and more productive. Low frequency of thread migrations is not sufficient to indicate that the thread model is good, but high frequency is always bad. When this signal is good, other can be used (profiling, lock contention measurements, etc)&lt;/p&gt;

&lt;h2&gt;&lt;a name=&quot;Approximatedesign&quot;&gt;&lt;/a&gt;Approximate design&lt;/h2&gt;

&lt;p&gt;The current core could be detected by the code:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: #006699; font-weight: bold; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;static&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt; unsigned getCoreId() {&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    unsigned id;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    __rdtscp(&amp;amp;id);&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    &lt;/span&gt;&lt;span style=&quot;color: #006699; font-weight: bold; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;return&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt; id;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;}&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.microsoft.com/en-us/cpp/intrinsics/rdtscp?view=msvc-170&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;This&lt;/a&gt; claims that &amp;#95;&amp;#95;rdtscp is &lt;em&gt;not&lt;/em&gt; a serializing instruction, it only &quot;waits until all previous instructions have executed&quot;. &lt;a href=&quot;https://stackoverflow.com/questions/41786929/is-mfence-for-rdtsc-necessary-on-x86-64-platform&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;This&lt;/a&gt; also confirms this statement. Thus we probably should not worry about performance implication, CC &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=amirsaman.memaripour%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;amirsaman.memaripour@mongodb.com&quot;&gt;amirsaman.memaripour@mongodb.com&lt;/a&gt; to confirm.&lt;/p&gt;

&lt;p&gt;It is very convenient to use thread local to track the last current core. Indeed, thread local variable will migrate together with thread to new core. The implementation should query the current core sufficiently often and increment the thread local counter to accumulate the observed migrations. In production, we may observe involuntary context switches to the tune 200k QPS, which hints it will be sufficient to query the current core inside a new listener on `_onContendedLock()` and then on `_onUnlock()`. Perhaps it will be cheaper to add a callback `_onContendedUnlock()` because the migration is unlikely to happen if the current thread was not put to sleep. Remember, the thread migration happens only when the thread is on runqueue.&lt;/p&gt;

&lt;h2&gt;&lt;a name=&quot;Collection&quot;&gt;&lt;/a&gt;Collection&lt;/h2&gt;

&lt;p&gt;Collection requirements are:&lt;br/&gt;
1. Do it as simple as possible for now, keeping future improvements in mind&lt;br/&gt;
2. Never increment a global counter, this is very expensive. Accumulate in thread local and flush periodically&lt;br/&gt;
3. TL counter flush can be attached to RAII decorator on opCtx, this may increment the global counter but probably no more than ~3k times/s (and many will be 0)&lt;br/&gt;
4. Keep in mind that in future we would like to collect some stats per user and per op, this one is good candidate&lt;/p&gt;

&lt;p&gt;We should accumulate the current migration count in thread local and flush when opCtx is created and destroyed. The longer shot task will be to use this count bucketed on command and on users. This will give us insight on which commands and which users are associated with the most thread migrations. This may also be used for better user isolation in future, the user creating the most of thread migrations should be the first to throttle.&lt;/p&gt;

&lt;p&gt;Flushing this counter to opCtx is easier so far we use the thread per connection model, it will break later. For asynchronous model later, we will need to flush on ThreadClient destruction, and then when the thread is recycled in the thread pool. This is conditional to observing this as a useful signal in production.&lt;/p&gt;

&lt;h2&gt;&lt;a name=&quot;Roadmap&quot;&gt;&lt;/a&gt;Roadmap&lt;/h2&gt;

&lt;p&gt;1. Implement a simple solution (this ticket)&lt;br/&gt;
2. Observe the value in stress tests, benchmarks and help incidents&lt;br/&gt;
3. If good signal is observed, implement the wrapper for blocking calls as described in the &lt;a href=&quot;https://docs.google.com/presentation/d/1E-xC3EeDcHkQPlHAVdPIeSNBkXANVm_NSEBdFzFJ9gc/edit?usp=sharing&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;presentation&lt;/a&gt; and test it with stress tests where the signal was good&lt;br/&gt;
4. Optimize the thread pool code using the wrapper implementation and the test where the good signal was observed&lt;br/&gt;
5. Work on the roadmap for more asynchronous model&lt;/p&gt;</description>
                <environment></environment>
        <key id="2118614">SERVER-69028</key>
            <summary>Collect thread migrations in FTDC</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-servicearch">Backlog - Service Architecture</assignee>
                                    <reporter username="andrew.shuvalov@mongodb.com">Andrew Shuvalov</reporter>
                        <labels>
                    </labels>
                <created>Sun, 21 Aug 2022 17:52:11 +0000</created>
                <updated>Thu, 2 Feb 2023 21:00:59 +0000</updated>
                                                                                                <votes>0</votes>
                                    <watches>9</watches>
                                                                                                                <comments>
                            <comment id="4773250" author="bruce.lucas@10gen.com" created="Wed, 24 Aug 2022 11:52:42 +0000"  >&lt;p&gt;Very good, thanks for the confirmation.&lt;/p&gt;</comment>
                            <comment id="4772006" author="JIRAUSER1256988" created="Tue, 23 Aug 2022 22:14:19 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bruce.lucas%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;bruce.lucas@mongodb.com&quot;&gt;bruce.lucas@mongodb.com&lt;/a&gt; yes, not in this ticket. The idea of per-user counters is &lt;em&gt;not&lt;/em&gt; to expose it in FTDC but to use it in our future user isolation implementation. As we don&apos;t have any there is no point to add it now. We need to assemble a list of ~3 different metrics to pinpoint abusive users to use it in user isolation. When we decide to do it, thread migrations will be one of those 3.&lt;/p&gt;

&lt;p&gt;Per-op metric is strictly for manual investigations, can be exposed with additional verbose filed in `serverStatus`. We should never have this kind of granularity in default FTDC. So yes, just 1 new metric.&lt;/p&gt;</comment>
                            <comment id="4770628" author="bruce.lucas@10gen.com" created="Tue, 23 Aug 2022 15:26:57 +0000"  >&lt;p&gt;Adding a single counter to FTDC sounds reasonable. I would be concerned about adding anything per-command or per-user to FTDC because of the volume of counters it could create.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25132"><![CDATA[Service Arch]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 23 Aug 2022 15:26:57 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 24 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>elizabeth.roytburd@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 24 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>andrew.shuvalov@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-server-servicearch</customfieldvalue>
            <customfieldvalue>bruce.lucas@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i1754f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i0pwug:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i16r9r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>