-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Text Search
-
None
-
Query Optimization
-
(copied to CRM)
Using a REGEX for a String.contains search is slow. Text search only works on word boundaries, so it does not yield any results for partial string matches.
If MongoDB were to add an NGRAM Index (http://lucene.apache.org/solr/guide/7_1/tokenizers.html) then searches using String.contains would be as fast as a "prefix expression” a.k.a regex String.startsWith(/^/). Of course, people would have to be careful concerning index size, but maybe one could specify a maximum length for the field to index and if that length is exceeded on document inserting / updating the write operation would fail stating the reason for the failure ("string too long for ngram index with max size n").
Additionally, one would need to specify whether to automatically cast the field to either lowercase or uppercase when creating the index.