Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30163

Additional terms in phrase search are implicitly ignored.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 3.4.6
    • Fix Version/s: Backlog
    • Component/s: Text Search
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      db.stores.insert(
         [
           { _id: 1, name: "Java Hut", description: "Coffee and cakes" },
           { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
           { _id: 3, name: "Coffee Shop", description: "Just coffee" },
           { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
           { _id: 5, name: "Java Shopping", description: "Indonesian goods" }
         ]
      )
      

      Then run the following text search:

      db.stores.find( { $text: { $search: "java \"coffee shop\"" } } )
      

      Only the one document containing the phrase "coffee shop" is returned. Documents containing the word "java" are not returned, and in fact having the term "java" in the search has no effect whatsoever on the results returned.

      Show
      db.stores.insert( [ { _id: 1, name: "Java Hut", description: "Coffee and cakes" }, { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" }, { _id: 3, name: "Coffee Shop", description: "Just coffee" }, { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" }, { _id: 5, name: "Java Shopping", description: "Indonesian goods" } ] ) Then run the following text search: db.stores.find( { $text: { $search: "java \"coffee shop\"" } } ) Only the one document containing the phrase "coffee shop" is returned. Documents containing the word "java" are not returned, and in fact having the term "java" in the search has no effect whatsoever on the results returned.

      Description

      As currently specified, the logic for phrase matching implicitly ignores any additional terms in the $search string.

      For example, the $search string:

      "\"ssl certificate\" authority key"

      Shall be compiled to the following search:

      "ssl certificate" and ("authority" or "key" or "ssl" or "certificate" )

      As you can see in the compiled search, all terms besides the phrase "ssl certificate" will be ignored. The search will match all strings that contain the phrase, and none other. In particular, strings that contain any of these additional terms - "authority", "key", or both - will not match unless they also contain the phrase.

      There are two problems with this behavior:

      1. It is surprising, counter-intuitive, and inconsistent with the behavior of regular text searches that do not contain phrases. As a user, I would not expect additional search terms to simply be ignored, certainly in this implicit manner, without any warnings. This is especially surprising given how normal matching works by creating an "OR" relation between the various terms.
      2. It is less powerful than it could be. Specifying additional terms should allow the user to make meaningful refinements to their search.

      Therefore, I propose that the $search strings with a phrase will perform a logical "OR" between the phrase and any of the other phrases or terms in the string.

      So, for example, the string above will compile to:

      "ssl certificate" OR "authority" OR "key"

      This behavior is consistent with the behavior of regular text searches (that do not contain phrases), and provides additional functionality when the user adds additional terms, instead of implicitly ignoring them as is done currently.

      Finally, this behavior conforms to parts of the manual that are currently wrong; see for example DOCS-10382.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: