[SERVER-14296] Text search responds slowly when query has negation or phrasing Created: 18/Jun/14  Updated: 28/Dec/23

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Austin Estep (Inactive) Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 1
Labels: qi-text-search
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-14578 Text search term negation processing ... Open
Assigned Teams:
Query Integration
Participants:
Case:

 Description   

When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.

> db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 32417,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 506,
	"nChunkSkips" : 0,
	"millis" : *114*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 32283,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 1663,
	"nChunkSkips" : 0,
	"millis" : *33276*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 20220,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 938,
	"nChunkSkips" : 0,
	"millis" : *15294*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}

I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:

        
       if (_params.query.hasNonTermPieces()) {
            if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) {
                return PlanStage::NEED_TIME;
            }
        }

I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.


Generated at Thu Feb 08 03:34:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.