<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:42:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-57860] $sampleFromRandomCursor should fallback to using a collection scan when random cursor returns too many duplicate </title>
                <link>https://jira.mongodb.org/browse/SERVER-57860</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The current logic of $sampleFromRandomCursor is to throw a uassert when the random cursor returns too many (100) duplicate records. We should instead fallback to using a collection scan in this case. &lt;/p&gt;</description>
                <environment></environment>
        <key id="1791107">SERVER-57860</key>
            <summary>$sampleFromRandomCursor should fallback to using a collection scan when random cursor returns too many duplicate </summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="backlog-query-execution">Backlog - Query Execution</assignee>
                                    <reporter username="arun.banala@mongodb.com">Arun Banala</reporter>
                        <labels>
                    </labels>
                <created>Mon, 21 Jun 2021 10:22:34 +0000</created>
                <updated>Tue, 6 Dec 2022 01:13:18 +0000</updated>
                            <resolved>Mon, 28 Jun 2021 19:57:48 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="3906022" author="kyle.suarez" created="Mon, 28 Jun 2021 19:57:49 +0000"  >&lt;p&gt;Closing this ticket as Won&apos;t Do, as we would rather not implement the fallbacks mentioned here. They probably won&apos;t be helpful for resharding.&lt;/p&gt;</comment>
                            <comment id="3905596" author="max.hirschhorn@10gen.com" created="Mon, 28 Jun 2021 17:31:04 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=arun.banala&quot; class=&quot;user-hover&quot; rel=&quot;arun.banala&quot;&gt;arun.banala&lt;/a&gt;, I feel like that would mean most documents in the collection are arbitrarily assigned a probability of zero by virtue of not appearing at the beginning of the collection scan then. This seems like more bias than $sample attempts to generally achieve. I had misunderstood and thought you were proposing falling back to the non storage engine optimized case of scanning through the whole collection.&lt;/p&gt;</comment>
                            <comment id="3905162" author="arun.banala" created="Mon, 28 Jun 2021 15:45:09 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kyle.suarez&quot; class=&quot;user-hover&quot; rel=&quot;kyle.suarez&quot;&gt;kyle.suarez&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt;, I was thinking more along the lines of return the first (N-K) documents that we find during a collection scan. Given that this scenario is unlikely to happen, I thought we can do something like this instead of failing. But we can definitely wait for storage team to investigate what&apos;s was happening in failure case. &lt;/p&gt;</comment>
                            <comment id="3898841" author="kyle.suarez" created="Thu, 24 Jun 2021 20:17:41 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ethan.zhang&quot; class=&quot;user-hover&quot; rel=&quot;ethan.zhang&quot;&gt;ethan.zhang&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=arun.banala&quot; class=&quot;user-hover&quot; rel=&quot;arun.banala&quot;&gt;arun.banala&lt;/a&gt;, based on this feedback from &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt;, I think we should close this ticket as Won&apos;t Fix, and instead ask Storage Execution to prioritize the investigation in the linked failure.&lt;/p&gt;</comment>
                            <comment id="3898632" author="max.hirschhorn@10gen.com" created="Thu, 24 Jun 2021 19:48:35 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ethan.zhang&quot; class=&quot;user-hover&quot; rel=&quot;ethan.zhang&quot;&gt;ethan.zhang&lt;/a&gt;, with regard to resharding, I&apos;m not sure this ticket is that desirable. I don&apos;t think it is practical to have reshardCollection scan through the entire collection to sample the data to choose new split points and then scan through the entire collection a second time to clone the actual data. The motivation for choosing the new split points via $sample was because of the storage engine optimization. My preference would be for whichever team it ends up on to get to the bottom of why the storage engine&apos;s random cursor fails to find unique documents.&lt;/p&gt;</comment>
                            <comment id="3897681" author="JIRAUSER1257640" created="Thu, 24 Jun 2021 15:04:35 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt; Can you shed light on what is the priority of this?&lt;/p&gt;</comment>
                            <comment id="3891595" author="JIRAUSER1257640" created="Tue, 22 Jun 2021 15:33:45 +0000"  >&lt;p&gt;We may need to have a flag to opt out of the fallback so we don&apos;t make the resharding performance even worse. This can be left as an implementation detail to be decided later.&lt;/p&gt;</comment>
                            <comment id="3888162" author="arun.banala" created="Mon, 21 Jun 2021 10:34:10 +0000"  >&lt;p&gt;If we had already return K documents and we hit &lt;a href=&quot;https://github.com/mongodb/mongo/blob/66cf66c99e74ea7bfc247ea46f6da35fd1e0417d/src/mongo/db/pipeline/document_source_sample_from_random_cursor.cpp#L139&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this&lt;/a&gt; code block, we could just open a new cursor and return the rest of the N-K using a normal collection scan. We should also ensure that all of the (N-K) are new documents by using &lt;tt&gt;_seenDocs&lt;/tt&gt;. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="390209">SERVER-29446</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25125"><![CDATA[Query Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 22 Jun 2021 15:33:45 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 32 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 32 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>19.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>arun.banala@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-query-execution</customfieldvalue>
            <customfieldvalue>ethan.zhang@mongodb.com</customfieldvalue>
            <customfieldvalue>kyle.suarez@mongodb.com</customfieldvalue>
            <customfieldvalue>max.hirschhorn@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hznjr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hz81rb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="4709">Query Execution 2021-07-12</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzn607:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>