-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Text Search
-
Query Integration
-
(copied to CRM)
When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.
> db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain() { "cursor" : "TextCursor", "n" : 32417, "nscannedObjects" : 32417, "nscanned" : 32417, "nscannedObjectsAllPlans" : 32417, "nscannedAllPlans" : 32417, "scanAndOrder" : false, "nYields" : 506, "nChunkSkips" : 0, "millis" : *114*, "server" : "overlord.local:27017", "filterSet" : false } > db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain() { "cursor" : "TextCursor", "n" : 32283, "nscannedObjects" : 32417, "nscanned" : 32417, "nscannedObjectsAllPlans" : 32417, "nscannedAllPlans" : 32417, "scanAndOrder" : false, "nYields" : 1663, "nChunkSkips" : 0, "millis" : *33276*, "server" : "overlord.local:27017", "filterSet" : false } > db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain() { "cursor" : "TextCursor", "n" : 20220, "nscannedObjects" : 32417, "nscanned" : 32417, "nscannedObjectsAllPlans" : 32417, "nscannedAllPlans" : 32417, "scanAndOrder" : false, "nYields" : 938, "nChunkSkips" : 0, "millis" : *15294*, "server" : "overlord.local:27017", "filterSet" : false }
I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:
if (_params.query.hasNonTermPieces()) { if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) { return PlanStage::NEED_TIME; } }
I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.
- related to
-
SERVER-14578 Text search term negation processing should use text index instead of fetching
- Open