Details
-
Improvement
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
Query Integration
-
(copied to CRM)
Description
When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.
> db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain()
|
{
|
"cursor" : "TextCursor",
|
"n" : 32417,
|
"nscannedObjects" : 32417,
|
"nscanned" : 32417,
|
"nscannedObjectsAllPlans" : 32417,
|
"nscannedAllPlans" : 32417,
|
"scanAndOrder" : false,
|
"nYields" : 506,
|
"nChunkSkips" : 0,
|
"millis" : *114*,
|
"server" : "overlord.local:27017",
|
"filterSet" : false
|
}
|
> db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain()
|
{
|
"cursor" : "TextCursor",
|
"n" : 32283,
|
"nscannedObjects" : 32417,
|
"nscanned" : 32417,
|
"nscannedObjectsAllPlans" : 32417,
|
"nscannedAllPlans" : 32417,
|
"scanAndOrder" : false,
|
"nYields" : 1663,
|
"nChunkSkips" : 0,
|
"millis" : *33276*,
|
"server" : "overlord.local:27017",
|
"filterSet" : false
|
}
|
> db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain()
|
{
|
"cursor" : "TextCursor",
|
"n" : 20220,
|
"nscannedObjects" : 32417,
|
"nscanned" : 32417,
|
"nscannedObjectsAllPlans" : 32417,
|
"nscannedAllPlans" : 32417,
|
"scanAndOrder" : false,
|
"nYields" : 938,
|
"nChunkSkips" : 0,
|
"millis" : *15294*,
|
"server" : "overlord.local:27017",
|
"filterSet" : false
|
}
|
I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:
|
if (_params.query.hasNonTermPieces()) {
|
if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) {
|
return PlanStage::NEED_TIME;
|
}
|
}
|
I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.
Attachments
Issue Links
- related to
-
SERVER-14578 Text search term negation processing should use text index instead of fetching
-
- Open
-