Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29918

stemming behavior for diacritics causes incorrect results

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: 3.4.4
    • Fix Version/s: None
    • Component/s: Text Search
    • Labels:
      None
    • Environment:
      ubuntu 16.04, mongodb 3.4.4
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      > db.test.insertMany([  
         { "_id":1, "name":"iphone" },
         { "_id":2, "name":"iphône" },
         { "_id":3, "name":"iphonë" },
         { "_id":4, "name":"iphônë" }
      ])
       
       
      > db.test.ensureIndex({name: "text"})
       
      > db.test.find({$text: {$search: "iphone"}})
      { "_id" : 1, "name" : "iphone" }
      { "_id" : 2, "name" : "iphône" }
       
      > db.test.find({name: "iphone"}).collation({locale: "en", strength: 1})
      { "_id" : 1, "name" : "iphone" }
      { "_id" : 2, "name" : "iphône" }
      { "_id" : 3, "name" : "iphonë" }
      { "_id" : 4, "name" : "iphônë" }
      
      

      Show
      > db.test.insertMany([ { "_id" : 1 , "name" : "iphone" }, { "_id" : 2 , "name" : "iphône" }, { "_id" : 3 , "name" : "iphonë" }, { "_id" : 4 , "name" : "iphônë" } ])     > db.test.ensureIndex({name: "text" })   > db.test.find({$text: {$search: "iphone" }}) { "_id" : 1 , "name" : "iphone" } { "_id" : 2 , "name" : "iphône" }   > db.test.find({name: "iphone" }).collation({locale: "en" , strength: 1 }) { "_id" : 1 , "name" : "iphone" } { "_id" : 2 , "name" : "iphône" } { "_id" : 3 , "name" : "iphonë" } { "_id" : 4 , "name" : "iphônë" }
    • Sprint:
      Query 2017-07-31, Query 2017-10-02, Query 2017-10-23, Query 2017-11-13

      Description

      $text search is not diacritic insensitive if the word contains a dieresis ( ¨ ). Dieresis is categorized as diacritic in Unicode 8.0 Character Database Prop List, cf http://www.unicode.org/Public/8.0.0/ucd/PropList.txt

      Search with collation works fine with

      strength = 1
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: