[SERVER-15090] Improve Text Indexes to support partial word match Created: 29/Aug/14 Updated: 28/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 2.6.4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Markus Padourek | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 133 |
| Labels: | qi-text-search | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||||||
| Participants: |
Aaron Motacek, Alex B., angieduan, Ashfaq nisar, Backlog - Query Integration, Billy Tetrud, Cassio Mosqueira, Chuck May, Dan Delaney, Dan Syrstad, Doug Tarr, Eduard Bosch, Govardhan, HivePoint Admin, Jason Benassi, Jim Liddell, Jon Lippincott, Markus Padourek, Mattia Alfieri, oscar milla bret, Santhosh, Sunil K Samanta, Timothy Frietas, Vytautas Pranskunas, Ygor Lemos
|
||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
As mentioned here http://docs.mongodb.org/manual/core/index-text/ it is possible to create text indexes, search for them and sort them by textScore which all works well. This could either be when you create the Index or when you query the database. Adding a potential implementation, there could be an additional option:
And the options for match could be "whole" (default), "prefix", "postfix", "partial". This would in my opinion majorly improve the full-text search in a lot of use-cases, for example building a search for a news site and you wouldn't have to add additional dependencies such as elasticsearch, which could be an overkill in some scenarios. |
| Comments |
| Comment by Vytautas Pranskunas [ 30/Sep/22 ] |
|
Any progress on this? thi sis ciritcal feature for performace for non-english alphabet because $diacriticSensitive cannot be attchieved with regex |
| Comment by Jon Lippincott [ 29/Jan/22 ] |
|
Hello. Any updates here? I don't see how search features are useful if there is no partial search. It's simply a requirement of any modern-day search engine.
Any comment would be greatly appreciated. Thanks. |
| Comment by Ashfaq nisar [ 07/Dec/21 ] |
|
Any update on this issue ? |
| Comment by Chuck May [ 09/Nov/21 ] |
|
This issue has been open for over 7 years. Is there any movement? I noticed that an open source fork of MongoDB called Percona has added this feature as an "ngram" language. See https://docs.percona.com/percona-server-for-mongodb/4.4/ngram-full-text-search.html |
| Comment by Aaron Motacek [ 24/Aug/21 ] |
|
This would be a great feature. |
| Comment by Doug Tarr [ 15/Jun/20 ] |
|
Hi Cassio - You can filter in Atlas Search to your tenant by using a compound filter query You would need to add your tenant id to your Atlas Search index to make this work. |
| Comment by Cassio Mosqueira [ 13/Jun/20 ] |
|
I don't think any of the solutions offered by MongoDB works well for a simple scenario: implementing a quick search with autocomplete on multiple fields (name, email, phone number). For this to work, we need partial matches, but text indexes don't support partial matches. So we looked into Atlas Search, but since the search needs to be the first stage in the pipeline, it makes no sense to use it on a multi-tenant database. Are we missing something? |
| Comment by HivePoint Admin [ 01/Apr/20 ] |
|
FYI, it's really easy to add your own ngram support. Just add an extra field to the document containing an array of ngrams of the words in your searchable text. (It's trivial to generate ngrams for any string.) Then add that ngrams field into your text search index (with a lower weighting than the other fields). We've been doing this for more than a year and it works great. |
| Comment by Doug Tarr [ 01/Apr/20 ] |
|
Hi Jason - ngrams are definitely something we are working on in our near-term roadmap. We don't yet have a date for that feature however. In the meantime, you can vote for the features that are important to you on our uservoice site: |
| Comment by Jason Benassi [ 01/Apr/20 ] |
|
Hi Timothy and team - We work with James Kovaks and made him aware of the need for us to also request the partial/fuzzy text search (ngram) functionality. Any idea when you guys might be implementing this in Atlas Search? Our current project has it as a requirement. We are looking to work around it but the options are not great - e.g. transforming/syncing with ES. It would be nice to just be able to use these features directly in MongoDB. It seems since you have enabled other lucene analyzers, is it a large lift to enable ngram type patterns/results? |
| Comment by Alex B. [ 26/Mar/20 ] |
|
We need this feature in an on-premises solution. We have a quick search in our application that scans fields like first name, last name, date of birth, street, zip code, etc.. A text index is perfect for this. But when the user enters something into the quick search, the results should be displayed immediately. For this use case we need the possibility to search for prefixes, too. Other solutions are just hacks which unnecessarily pollute the data model and increase the data volume. We do not want to setup and maintain additional software like Elastic Search or Solr for such a fundamental feature. Unfortunately this feature is only available in Atlas. However, this feature is very important for our application, because our customers are already using it intensively in other applications (with other databases). Due to Google and co this kind of search is expected by our customers. We would appreciate it very much if this feature would also be available on-prem. Of course we would like to participate in further discussions. We are also looking forward to further ideas on how we can implement our requirements correctly. Unfortunately a contribution under https://mongodb.canny.io/searchbeta is no longer possible, because the link is not valid anymore. |
| Comment by Timothy Frietas (Inactive) [ 16/Jul/19 ] |
|
There are no plans to support $searchBeta for on-prem installations and FTS is currently planned to be an Atlas-only feature indefinitely. As with all features during beta we are considering the overall experience and future scope and welcome feedback and the opportunity to have conversations with customers who are seeking an on-prem solution to better understand their needs and inform our future roadmap. We welcome feature requests and suggestions for Atlas Search. You can create and upvote feature requests or suggestions here: |
| Comment by Santhosh [ 16/Jul/19 ] |
|
Hi Timothy Is the full-text search coming to on premise mongodb instances or only Atlas? |
| Comment by Timothy Frietas (Inactive) [ 15/Jul/19 ] |
|
Our new Full Text Search feature is now in beta (using the $searchBeta operator). While the feature is Atlas-only and does not yet support all cluster sizes (it is currently only supported in M30 cluster sizes and above), we are working on both expanding hardware support and extending functionality. Many of the use cases described here are already supported in the beta. Please see https://docs.atlas.mongodb.com/reference/full-text-search/term/ for more info. If you want to try out a Full Text Search on a cluster of your own you can use credit activation code MONGODB4DOT2 for $200 of Atlas credit to get started. |
| Comment by angieduan [ 08/Jul/19 ] |
|
This should be the feature that needed by almost everyone, so is there a plan for this? Match with regex is extremely slow on big collections... |
| Comment by HivePoint Admin [ 12/Apr/19 ] |
|
FYI, we have a similar need and solved this problem by creating an additional field in the record that includes ngrams for each of the words in the text field(s). (If you're not familiar with n-grams, they are trivial – just all of the partial word substrings of each word. See https://en.wikipedia.org/wiki/N-gram.) This new field is added as an additional field in the full-text index. This works great for us and solves the "as-you-type" problem. Obviously, it adds a significant amount of text in the index, but in our case this isn't a problem. (We are also careful about only including unique n-grams and limiting their length.) |
| Comment by Ygor Lemos [ 04/Feb/19 ] |
|
up! |
| Comment by Govardhan [ 18/Oct/18 ] |
|
I have been searching in google to fix similar issue in Mongo DB. I have been developing couple of APIs in my company where we have search criteria on partial key word. Please release this feature soon. |
| Comment by Dan Delaney [ 06/Aug/18 ] |
|
I just signed up for this to comment on this issue - we have a lot of search happening on our website and this would be super helpful. Please make this happen soon! |
| Comment by oscar milla bret [ 29/Jul/18 ] |
|
Important for me too |
| Comment by Mattia Alfieri [ 11/Feb/18 ] |
|
I think this feature is really important. I've personally built several APIs for various companies and products all backed by MongoDB and the single issue i've had is the lack of this feature. Sometimes you can settle with $regex, but not everytime. Even simple SaaS APIs need partial word matching on a full text index and the current solution is to use software like Elasticsearch or Solr in combination with mongodb, which really makes costs go up for smaller companies (more servers / more training / hire an expert) |
| Comment by Dan Syrstad [ 19/Dec/17 ] |
|
+1 This request would come close to satisfying our needs, but what would really be great is the Lucene/Solr/Elasticsearch wildcard capability: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Wildcard%20Searches. |
| Comment by Sunil K Samanta [ 05/Dec/17 ] |
|
Stuck at this point. This is seriously needed. |
| Comment by Ygor Lemos [ 18/Jul/17 ] |
|
+42. This is one of the main reasons a lot of people needs to spin up elasticsearch/solr instances together with mongo. Having partial word support would allow the vast majority of full text searching cases to be handled directly by mongo since there's already support for stemming, tokenization and search scoring. |
| Comment by Eduard Bosch [ 23/May/17 ] |
|
Maybe an actual solution could be to implement in our databases this explanation Keywords – partial (and case insensitive) searches in the following article? This could help indexing texts and search efficiently by multiple keywords with partial insensitive searches. |
| Comment by Billy Tetrud [ 08/Apr/16 ] |
|
Yes please, this is what I expected the text search query to do by default. I'm disappointed this doesn't exist yet : / |
| Comment by Jim Liddell [ 24/Oct/14 ] |
|
I have also found the full-text search support lacking in this area. In my case, we have an application for managing customer data which is stored in Mongo. Our search covers various fields within the customer collection including the name, email and postcode. Let's assume that I have a customer document that looks something like this: With a text index over these fields, I would ideally like a search for 'joe bloggs' to match the above document. However, as it stands, this does not work. It would work if the email address happened to be in the form 'joe.bloggs@blah.com' because the '.' characters causes the separate components to be indexed as separate words, but this is not always the case. Supporting regular expressions when using text search would solve this: }) As would some kind of additional option: } }) Without this, I have dropped back to indexing the fields individually (a compound index does not appear to work for our queries), and am relying on index intersection to satisfy the queries. Another option I spiked out was to follow the approach described at http://docs.mongodb.org/manual/tutorial/model-data-for-keyword-search/, and maintain a keywords property against each document. Into the keywords property I added all possible sub-strings (above 2 characters) of the fields I was interested in. I could then create a single field index against the keywords field, and execute queries against that field alone. This worked (and gives faster performance than index intersection with regex queries), but I wasn't comfortable pushing the responsibility of managing keywords into the application, where this feels like it is a responsibility of the data storage technology. Supporting partial matches on full text search would negate the need for this at all. |