[SERVER-11014] Text Search does not stem documents Created: 03/Oct/13  Updated: 10/Dec/14  Resolved: 03/Oct/13

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.5.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Scott Erickson Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: indexing, text_index
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Fedora Linux 19


Issue Links:
Duplicate
duplicates SERVER-10150 Inconsistent / missing text search re... Closed
Operating System: ALL
Steps To Reproduce:

Add a document to the collection like this:

  {
    "name": "Physical",
    "index": true,
    "code": "...",
    "description": "This Thang has physical presence (position, size, shape).",
    "system": "physics",
  }

Add an index to the system with values:

{index: 1, name: 'text', description: 'text', system: 'text'}

that is sparse. The one in my system looks like this:

{
	"v": 1,
	"key": {
		"index": 1,
		"_fts": "text",
		"_ftsx": 1
	},
	"ns": "coco.level.components",
	"name": "search index",
	"sparse": true,
	"background": true,
	"safe": null,
	"weights": {
		"description": 1,
		"name": 1,
		"system": 1
	},
	"default_language": "english",
	"language_override": "language",
	"textIndexVersion": 1
}

Now from the client, run searches like:

db.level.components.runCommand('text', {search: 'physical', filter: {index:true}}) // fail
db.level.components.runCommand('text', {search: 'presence', filter: {index:true}}) // fail
db.level.components.runCommand('text', {search: 'size', filter: {index:true}}) // hit
db.level.components.runCommand('text', {search: 'physics', filter: {index:true}}) // fail

Each of the failures, when I checked the results, had stemmed the search term. So 'physics' and 'physical' became 'physic', 'presence' became 'presenc'. 'size' stayed the same, and hit the sample. Update the document to have 'presenc' instead of 'presence', and then search for 'presence', and the search is success. So if the document text is stemmed manually, it seems, the search works.

I don't know if the system is failing to stem the words before indexing them, but that's what it looks like based on these tests.

Participants:

 Description   

The text search feature appears to stem the search term, but not the data itself. When I have 'physical' in the document for example, and I search 'physical', the search term becomes 'physic' but does not return what should be a straight match.



 Comments   
Comment by Scott Erickson [ 03/Oct/13 ]

After more testing, I figured it out: these objects I'm actually working with are much larger, but I cut down on the number of properties to keep the issue succinct. One of the other properties these docs actually have is 'language' which specifies what programming language the 'code' field is in. I see now according to the docs:

http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/

'language' is a field used to specify what spoken language the document is in for the text search. In my case, the database was trying to stem the document for the language 'coffeescript'.

We'll figure out a different property name for this case. Sorry for submitting a false issue, and thanks for helping me sort this one out for myself.

Comment by Daniel Pasette (Inactive) [ 03/Oct/13 ]

I am unable to reproduce this reported behavior in 2.5.2. All text searches return successfully. Are you doing anything different than the following steps?

> db.version()
2.5.2
> db.f.drop()
true
> db.f.insert({   "name": "Physical",   "index": true,   "code": "...",   "description": "This Thang has physical presence (position, size, shape).",   "system": "physics", })
> db.f.ensureIndex({index: 1, name: 'text', description: 'text', system: 'text'}, {background:true, sparse:true})
> db.f.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "fts.f",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"index" : 1,
			"_fts" : "text",
			"_ftsx" : 1
		},
		"ns" : "fts.f",
		"name" : "index_1_name_text_description_text_system_text",
		"background" : true,
		"sparse" : true,
		"weights" : {
			"description" : 1,
			"name" : 1,
			"system" : 1
		},
		"default_language" : "english",
		"language_override" : "language",
		"textIndexVersion" : 1
	}
]
> db.f.runCommand('text', {search: 'physical', filter: {index:true}})
{
	"queryDebugString" : "physic||||||",
	"language" : "english",
	"results" : [
		{
			"score" : 2.5833333333333335,
			"obj" : {
				"_id" : ObjectId("524ced9663b033a904373eea"),
				"name" : "Physical",
				"index" : true,
				"code" : "...",
				"description" : "This Thang has physical presence (position, size, shape).",
				"system" : "physics"
			}
		}
	],
	"stats" : {
		"nscanned" : 1,
		"nscannedObjects" : 0,
		"n" : 1,
		"nfound" : 1,
		"timeMicros" : 198
	},
	"ok" : 1
}
> db.f.runCommand('text', {search: 'presence', filter: {index:true}})
{
	"queryDebugString" : "presenc||||||",
	"language" : "english",
	"results" : [
		{
			"score" : 0.5833333333333334,
			"obj" : {
				"_id" : ObjectId("524ced9663b033a904373eea"),
				"name" : "Physical",
				"index" : true,
				"code" : "...",
				"description" : "This Thang has physical presence (position, size, shape).",
				"system" : "physics"
			}
		}
	],
	"stats" : {
		"nscanned" : 1,
		"nscannedObjects" : 0,
		"n" : 1,
		"nfound" : 1,
		"timeMicros" : 123
	},
	"ok" : 1
}
> db.f.runCommand('text', {search: 'size', filter: {index:true}})
{
	"queryDebugString" : "size||||||",
	"language" : "english",
	"results" : [
		{
			"score" : 0.5833333333333334,
			"obj" : {
				"_id" : ObjectId("524ced9663b033a904373eea"),
				"name" : "Physical",
				"index" : true,
				"code" : "...",
				"description" : "This Thang has physical presence (position, size, shape).",
				"system" : "physics"
			}
		}
	],
	"stats" : {
		"nscanned" : 1,
		"nscannedObjects" : 0,
		"n" : 1,
		"nfound" : 1,
		"timeMicros" : 165
	},
	"ok" : 1
}
> db.f.runCommand('text', {search: 'physics', filter: {index:true}})
{
	"queryDebugString" : "physic||||||",
	"language" : "english",
	"results" : [
		{
			"score" : 2.5833333333333335,
			"obj" : {
				"_id" : ObjectId("524ced9663b033a904373eea"),
				"name" : "Physical",
				"index" : true,
				"code" : "...",
				"description" : "This Thang has physical presence (position, size, shape).",
				"system" : "physics"
			}
		}
	],
	"stats" : {
		"nscanned" : 1,
		"nscannedObjects" : 0,
		"n" : 1,
		"nfound" : 1,
		"timeMicros" : 132
	},
	"ok" : 1
}

Generated at Thu Feb 08 03:24:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.