[DOCS-3865] Comment on: "manual/core/index-text.txt" Created: 04/Aug/14  Updated: 03/Nov/17  Resolved: 04/Aug/14

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: 01112017-cleanup

Type: Improvement Priority: Major - P3
Reporter: Claudio Marrero Assignee: Unassigned
Resolution: Done Votes: 0
Labels: collector-298ba4e7
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 7, Linux 14.10, Linux 12.10

Location: http://docs.mongodb.org/manual/core/index-text/#text-search
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
Screen Resolution: 1920 x 1080
repo: docs
source: core/index-text


Participants:
Days since reply: 9 years, 28 weeks, 2 days ago

 Description   

I have reported an error previously here: https://jira.mongodb.org/browse/DOCS-3863

Key kim close the ticket saying me this:

Hi Claudio –
thanks for taking the time to file this ticket with very clear examples.
The text search matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document.
So, for your example, I believe the complete stemmed word for textt is textt and the stemmed search term for text is text and since the match is on the complete stemmed word, the two will not match.
For more information, refer to http://docs.mongodb.org/manual/core/index-text/#text-search

The things, Key Kim is wrong..

In mongodb 2.6.3 you can search a word, for example, if you have the following documents:

db.foo.find({});

{ "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "This is a bug textt" } { "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "Text" } { "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "I wrote a text" } { "_id" : ObjectId("53dfd9453e1e4201402d2f5d"), "desc" : "This is a string with text" }

and you have a "text" index for desc

db.foo.ensureIndex(

{"desc":"text"}

);

And you make this query finding text:

db.foo.find({$text:{ $search:"text" }});

You will get:

{ "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "Text" } { "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "I wrote a text" } { "_id" : ObjectId("53dfd9453e1e4201402d2f5d"), "desc" : "This is a string with text" }

Notice that the document:

{ "_id" : ObjectId("53dfd9453e1e4201402d2f5b"), "desc" : "This is a bug textt" }

has not been received.

I try this with the fallowing cases..

I added many documents with this text string into a differents phrases..

walk — walkk
text ----- textt
hello — helloo
name – namee

I thing the bug here is the double character in the end fo the key string or somelike that.

I don't know for what reason Kim told me that this can't be done with mongo, I make all this querys on shell of mongo 2.6.3,

Best
C

PD: For more details I found this ticket: https://jira.mongodb.org/browse/SERVER-380 where Jason Rassi says that the feature has been deployed in version of 2.6.3

PD: And sorry for my english.



 Comments   
Comment by Kay Kim (Inactive) [ 04/Aug/14 ]

Just one more thing that might clarify:

When I state that
The text search matches on the complete stemmed words.

What I mean is if we have a string desc: "Night fell early" , then the stemmer stems each term in the sentence to 3 separate stemmed words night, fell, early and this allows for you to search on any of the three stemmed terms. Text search is case insensitive which is why I had it night instead of Night but I could have easily typed Night

Hope this clarifies a bit.

Comment by Kay Kim (Inactive) [ 04/Aug/14 ]

Hi Claudio –

There seems to be some confusion regarding stemmed words. Stemming uses various suffix matching logic to stem words.

If I have the following documents:

{ "_id" : 8, "desc" : "This is an correct texts." }
{ "_id" : 2, "desc" : "This is also correct texting." }
{ "_id" : 7, "desc" : "This is an correct texted." }
{ "_id" : 12, "desc" : "This is a correct text." }
{ "_id" : 3, "desc" : "This is an incorrect texte." }

The stemming logic recognizes certain suffixes and as such stems "texts", "texting", "texted", "text", "texte" to text. So, these documents will return if I do a text search on a term that also stems to text.

For example, "texting" also stems to text, so

db.foo.find( { $text: { $search: "texting" } } )

will match the above documents.

However, "tex" stems to tex, so it will not match any of the above documents.

db.foo.find( { $text: { $search: "tex" } } )

In your example, the stemmer doesn't know how to stem "textt" and as such, the stemmed version of "textt" is "textt. So, "text" will not match "textt".

As a side note, the stemmer stems "textting to "text" and thus will match "text" but not "textt".

Hope this helps.

Generated at Thu Feb 08 07:46:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.