[SERVER-32157] add NGRAM index Created: 04/Dec/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Ronald Feicht Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 7
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Using a REGEX for a String.contains search is slow. Text search only works on word boundaries, so it does not yield any results for partial string matches.

If MongoDB were to add an NGRAM Index (http://lucene.apache.org/solr/guide/7_1/tokenizers.html) then searches using String.contains would be as fast as a "prefix expression” a.k.a regex String.startsWith(/^/). Of course, people would have to be careful concerning index size, but maybe one could specify a maximum length for the field to index and if that length is exceeded on document inserting / updating the write operation would fail stating the reason for the failure ("string too long for ngram index with max size n").

Additionally, one would need to specify whether to automatically cast the field to either lowercase or uppercase when creating the index.


Generated at Thu Feb 08 04:29:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.