Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14296

Text search responds slowly when query has negation or phrasing

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Text Search
    • Labels:
    • Query Integration

      When working on a larger data set, text queries with negations and phrasing take a very long time to resolve. Negations take longer than phrasing, and negations which do not substantially filter the results take longer than those in which a negation removes many results.

      > db.combined.find({$text: {$search: 'test'}}, {_id: 0, 'project': 1}).explain()
      {
      	"cursor" : "TextCursor",
      	"n" : 32417,
      	"nscannedObjects" : 32417,
      	"nscanned" : 32417,
      	"nscannedObjectsAllPlans" : 32417,
      	"nscannedAllPlans" : 32417,
      	"scanAndOrder" : false,
      	"nYields" : 506,
      	"nChunkSkips" : 0,
      	"millis" : *114*,
      	"server" : "overlord.local:27017",
      	"filterSet" : false
      }
      > db.combined.find({$text: {$search: 'test -linux64'}}, {_id: 0, 'project': 1}).explain()
      {
      	"cursor" : "TextCursor",
      	"n" : 32283,
      	"nscannedObjects" : 32417,
      	"nscanned" : 32417,
      	"nscannedObjectsAllPlans" : 32417,
      	"nscannedAllPlans" : 32417,
      	"scanAndOrder" : false,
      	"nYields" : 1663,
      	"nChunkSkips" : 0,
      	"millis" : *33276*,
      	"server" : "overlord.local:27017",
      	"filterSet" : false
      }
      > db.combined.find({$text: {$search: 'test -data'}}, {_id: 0, 'project': 1}).explain()
      {
      	"cursor" : "TextCursor",
      	"n" : 20220,
      	"nscannedObjects" : 32417,
      	"nscanned" : 32417,
      	"nscannedObjectsAllPlans" : 32417,
      	"nscannedAllPlans" : 32417,
      	"scanAndOrder" : false,
      	"nYields" : 938,
      	"nChunkSkips" : 0,
      	"millis" : *15294*,
      	"server" : "overlord.local:27017",
      	"filterSet" : false
      }
      

      I believe the problem stems from a large number of unamortized disk reads while scanning the text index. From db/exec/test.cpp:258-262:

              
             if (_params.query.hasNonTermPieces()) {
                  if (!_ftsMatcher.matchesNonTerm(_params.index->getCollection()->docFor(loc))) {
                      return PlanStage::NEED_TIME;
                  }
              }
      

      I believe the second line loads a document from disk at each step of the text index iterator, which is accounting for much of the extra time. However, the machine I ran this test on has an SSD, so it may be that some of the time is actually lost in _ftsMatcher.matchesNonTerm.

            Assignee:
            backlog-query-integration [DO NOT USE] Backlog - Query Integration
            Reporter:
            austin.estep@10gen.com Austin Estep (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: