<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:44:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-17533] FTS query plan for AND queries is inefficient for multiple and/or common search words</title>
                <link>https://jira.mongodb.org/browse/SERVER-17533</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;An AND query in MongoDB FTS seems to be implemented as fetching all of the words separately, then taking an intersection. This is not efficient when:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;A lot of search words are given. (e.g 5-10 instead of 1-2.) In theory       this should help narrow down the result set and make the query faster,       but in MongoDB this will always make the query slower. (``nscanned``       and ``nscannedObjects`` will grow even if the result set ``n`` becomes smaller.)&lt;/li&gt;
&lt;/ol&gt;


&lt;ol&gt;
	&lt;li&gt;One or more very common search words are given.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;For example, consider we search a database with people&apos;s addresses with the following words::&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    db.somecollection.find( { $text : { $search : &apos;&quot;John&quot; &quot;Ezequiel&quot; &quot;Smith&quot; &quot;New&quot; &quot;York&quot;&apos; } } )&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;In the above example, one of the words - &quot;Ezequiel&quot; - is clearly an uncommon name, which will occur with low frequency in the database. On the other hand the names &quot;John Smith&quot; and &quot;New York&quot; are clearly very common ones, especially the number of entries for &quot;New York&quot; might very well be in the millions.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;What should happen&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;An efficient query plan for MongoDB would be to just search the FTS index for &quot;Ezequiel&quot;, then filter the result set of that to exclude documents that do not include all of the other search words.&lt;/p&gt;

&lt;p&gt;Is it possible to deduce from our btree indexes how many occurrences there are of the (after stemming) identical value? If yes, the query planner should consider this and only scan the index for values where this number is low (and therefore selectiveness high).&lt;/p&gt;

&lt;p&gt;If such statistics is not possible, one possible way to implement this is analogous to how the query planner selects execution plans for regular queries. Execute one thread per search word in parallel, each thread scanning the index for the given word. When the fastest thread finishes scanning the index, start fetching the documents pointed to, and filter out documents that don&apos;t match all of the given AND search words. When this is finished, return this result set and kill the other threads that are still scanning the index.&lt;/p&gt;

&lt;p&gt;In the above example, query exectuion would start by launching a separate thread for each word to be scanned from the index. The thread scanning for &quot;Ezequiel&quot; would complete first and would then proceed to fetching the documents containing &quot;Ezequiel&quot; and apply filtering on those documents. Once this completes, the threads scanning for &quot;John&quot;, &quot;Smith&quot;, &quot;New&quot; and &quot;York&quot; would simply be terminated and their results discarded.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Workaround&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Note that with the current version of MongoDB, the latter method above can also be implemented on the application level, as a workaround to optimize slow FTS queries.&lt;/p&gt;</description>
                <environment></environment>
        <key id="188796">SERVER-17533</key>
            <summary>FTS query plan for AND queries is inefficient for multiple and/or common search words</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-integration">Backlog - Query Integration</assignee>
                                    <reporter username="henrik.ingo@mongodb.com">Henrik Ingo</reporter>
                        <labels>
                            <label>qi-text-search</label>
                    </labels>
                <created>Wed, 11 Mar 2015 07:38:34 +0000</created>
                <updated>Thu, 28 Dec 2023 18:34:39 +0000</updated>
                                            <version>2.6.8</version>
                                                    <component>Text Search</component>
                                        <votes>2</votes>
                                    <watches>10</watches>
                                                                                                                    <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25467"><![CDATA[Query Integration]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 11 Mar 2015 15:21:44 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        8 years, 49 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ted.tuckman@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            8 years, 49 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-integration</customfieldvalue>
            <customfieldvalue>henrik.ingo@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlb1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2dv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hru5zr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>