Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11014

Text Search does not stem documents

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.5.2
    • Component/s: Text Search
    • Environment:
      Fedora Linux 19
    • ALL
    • Hide

      Add a document to the collection like this:

        {
          "name": "Physical",
          "index": true,
          "code": "...",
          "description": "This Thang has physical presence (position, size, shape).",
          "system": "physics",
        }
      

      Add an index to the system with values:

      {index: 1, name: 'text', description: 'text', system: 'text'}

      that is sparse. The one in my system looks like this:

      {
      	"v": 1,
      	"key": {
      		"index": 1,
      		"_fts": "text",
      		"_ftsx": 1
      	},
      	"ns": "coco.level.components",
      	"name": "search index",
      	"sparse": true,
      	"background": true,
      	"safe": null,
      	"weights": {
      		"description": 1,
      		"name": 1,
      		"system": 1
      	},
      	"default_language": "english",
      	"language_override": "language",
      	"textIndexVersion": 1
      }
      

      Now from the client, run searches like:

      db.level.components.runCommand('text', {search: 'physical', filter: {index:true}}) // fail
      db.level.components.runCommand('text', {search: 'presence', filter: {index:true}}) // fail
      db.level.components.runCommand('text', {search: 'size', filter: {index:true}}) // hit
      db.level.components.runCommand('text', {search: 'physics', filter: {index:true}}) // fail
      

      Each of the failures, when I checked the results, had stemmed the search term. So 'physics' and 'physical' became 'physic', 'presence' became 'presenc'. 'size' stayed the same, and hit the sample. Update the document to have 'presenc' instead of 'presence', and then search for 'presence', and the search is success. So if the document text is stemmed manually, it seems, the search works.

      I don't know if the system is failing to stem the words before indexing them, but that's what it looks like based on these tests.

      Show
      Add a document to the collection like this: { "name" : "Physical" , "index" : true , "code" : "..." , "description" : "This Thang has physical presence (position, size, shape)." , "system" : "physics" , } Add an index to the system with values: {index: 1, name: 'text', description: 'text', system: 'text'} that is sparse. The one in my system looks like this: { "v" : 1, "key" : { "index" : 1, "_fts" : "text" , "_ftsx" : 1 }, "ns" : "coco.level.components" , "name" : "search index" , "sparse" : true , "background" : true , "safe" : null , "weights" : { "description" : 1, "name" : 1, "system" : 1 }, "default_language" : "english" , "language_override" : "language" , "textIndexVersion" : 1 } Now from the client, run searches like: db.level.components.runCommand( 'text' , {search: 'physical' , filter: {index: true }}) // fail db.level.components.runCommand( 'text' , {search: 'presence' , filter: {index: true }}) // fail db.level.components.runCommand( 'text' , {search: 'size' , filter: {index: true }}) // hit db.level.components.runCommand( 'text' , {search: 'physics' , filter: {index: true }}) // fail Each of the failures, when I checked the results, had stemmed the search term. So 'physics' and 'physical' became 'physic', 'presence' became 'presenc'. 'size' stayed the same, and hit the sample. Update the document to have 'presenc' instead of 'presence', and then search for 'presence', and the search is success. So if the document text is stemmed manually, it seems, the search works. I don't know if the system is failing to stem the words before indexing them, but that's what it looks like based on these tests.

      The text search feature appears to stem the search term, but not the data itself. When I have 'physical' in the document for example, and I search 'physical', the search term becomes 'physic' but does not return what should be a straight match.

            Assignee:
            Unassigned Unassigned
            Reporter:
            sderickson Scott Erickson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: