[SERVER-11742] Spanish text search stemmer Created: 16/Nov/13  Updated: 10/Dec/14  Resolved: 28/Nov/13

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Miguel G Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos


Operating System: Linux
Participants:

 Description   

The stemmer apparently removes the 'o' at the end of each word (we have quite a few words which end in 'o' so you can see how problematic this is

So if I run this query: db.collection.runCommand( "text",

{ search: "barco", language:"spanish" }

)

I get the following output, and no results even though there's a field containing the word 'barco' (notice how the 'o' has been removed in the queryDebugString field):

{
	"queryDebugString" : "barc||||||",
	"language" : "spanish",
	"results" : [ ],
	"stats" : {
		"nscanned" : 0,
		"nscannedObjects" : 0,
		"n" : 0,
		"nfound" : 0,
		"timeMicros" : 1208
	},
	"ok" : 1
}

But if I run the same query but choosing english as language: db.collection.runCommand( "text",

{ search: "barco", language:"english" }

)

I get a result (notice that the 'o' has not been removed this time)

{
	"queryDebugString" : "barco||||||",
	"language" : "english",
	"results" : [
		{
			"score" : 1.1,
			"obj" : {
				"_id" : ObjectId("527822523dd360464b4fd1d7"),
...
}

Any idea why the 'o' is being removed in spanish?

Many thanks



 Comments   
Comment by J Rassi [ 28/Nov/13 ]

Dup of mailing list post: https://groups.google.com/forum/#!topic/mongodb-user/RJN77A6A_9o

Generated at Thu Feb 08 03:26:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.