Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15090

Improve Text Indexes to support partial word match

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.4
    • Component/s: Text Search
    • Labels:
    • Query Integration

      As mentioned here http://docs.mongodb.org/manual/core/index-text/ it is possible to create text indexes, search for them and sort them by textScore which all works well.
      But as mentioned in http://stackoverflow.com/questions/21018738/mongodb-fulltext-search-workaroud-for-partial-word-match and http://stackoverflow.com/questions/17887140/mongodb-full-text-search-matching-precesion it would be very useful to add some option for partial word matching.

      This could either be when you create the Index or when you query the database. Adding a potential implementation, there could be an additional option:

      db.reviews.ensureIndex( 
        { comments: "text" } ,
        { match: "partial" }
      )
      

      And the options for match could be "whole" (default), "prefix", "postfix", "partial".
      Where-as prefix would additionally search with the input-query at the beginning of the word, so for example the stored word is "blueberry":
      Using prefix:
      It would behave the same as the current search behaviour and in addition a query like "blue" or "bl" or "blueb" would also return "blueberry", where-as "berry", "eberry", "erry" would not.
      Using postfix: Exactly the reverse of "prefix"
      Partial: Matches any characters at any position in the word. So "b", "ber" "ue" all would return "blueberry".

      This would in my opinion majorly improve the full-text search in a lot of use-cases, for example building a search for a news site and you wouldn't have to add additional dependencies such as elasticsearch, which could be an overkill in some scenarios.
      Furhter it would also be much better than regular expression search in these cases, because you get all the 'automatic' indexing, the sorting capabilities and it is less error prone because you don't have to write any regular expression.

            Created:
            Updated: