-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Text Search
-
Query Integration
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.
> db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain()
{
"cursor" : "TextCursor",
"n" : 32417,
"nscannedObjects" : 32417,
"nscanned" : 32417,
"nscannedObjectsAllPlans" : 32417,
"nscannedAllPlans" : 32417,
"scanAndOrder" : false,
"nYields" : 506,
"nChunkSkips" : 0,
"millis" : *114*,
"server" : "overlord.local:27017",
"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain()
{
"cursor" : "TextCursor",
"n" : 32283,
"nscannedObjects" : 32417,
"nscanned" : 32417,
"nscannedObjectsAllPlans" : 32417,
"nscannedAllPlans" : 32417,
"scanAndOrder" : false,
"nYields" : 1663,
"nChunkSkips" : 0,
"millis" : *33276*,
"server" : "overlord.local:27017",
"filterSet" : false
}
> db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain()
{
"cursor" : "TextCursor",
"n" : 20220,
"nscannedObjects" : 32417,
"nscanned" : 32417,
"nscannedObjectsAllPlans" : 32417,
"nscannedAllPlans" : 32417,
"scanAndOrder" : false,
"nYields" : 938,
"nChunkSkips" : 0,
"millis" : *15294*,
"server" : "overlord.local:27017",
"filterSet" : false
}
I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:
if (_params.query.hasNonTermPieces()) {
if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) {
return PlanStage::NEED_TIME;
}
}
I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.
- related to
-
SERVER-14578 Text search term negation processing should use text index instead of fetching
-
- Open
-