Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Text Search
Labels:
- qi-text-search

Assigned Teams:

Query Integration
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.

> db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 32417,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 506,
	"nChunkSkips" : 0,
	"millis" : *114*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 32283,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 1663,
	"nChunkSkips" : 0,
	"millis" : *33276*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain()
{
	"cursor" : "TextCursor",
	"n" : 20220,
	"nscannedObjects" : 32417,
	"nscanned" : 32417,
	"nscannedObjectsAllPlans" : 32417,
	"nscannedAllPlans" : 32417,
	"scanAndOrder" : false,
	"nYields" : 938,
	"nChunkSkips" : 0,
	"millis" : *15294*,
	"server" : "overlord.local:27017",
	"filterSet" : false
}

I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:

        
       if (_params.query.hasNonTermPieces()) {
            if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) {
                return PlanStage::NEED_TIME;
            }
        }

I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.

related to

SERVER-14578 Text search term negation processing should use text index instead of fetching

Open

Assignee:: [DO NOT USE] Backlog - Query Integration
Reporter:: Austin Estep (Inactive)
Participants:: [DO NOT USE] Backlog - Query Integration, Austin Estep
Votes:: 1 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jun 18 2014 07:24:33 PM UTC
Updated:: Dec 28 2023 06:38:16 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates