Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29918

stemming behavior for diacritics causes incorrect results

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.4
    • Component/s: Text Search
    • Labels:
      None
    • Environment:
      ubuntu 16.04, mongodb 3.4.4
    • ALL
    • Hide
      > db.test.insertMany([  
         { "_id":1, "name":"iphone" },
         { "_id":2, "name":"iphône" },
         { "_id":3, "name":"iphonë" },
         { "_id":4, "name":"iphônë" }
      ])
      
      
      > db.test.ensureIndex({name: "text"})
      
      > db.test.find({$text: {$search: "iphone"}})
      { "_id" : 1, "name" : "iphone" }
      { "_id" : 2, "name" : "iphône" }
      
      > db.test.find({name: "iphone"}).collation({locale: "en", strength: 1})
      { "_id" : 1, "name" : "iphone" }
      { "_id" : 2, "name" : "iphône" }
      { "_id" : 3, "name" : "iphonë" }
      { "_id" : 4, "name" : "iphônë" }
      
      
      Show
      > db.test.insertMany([ { "_id" :1, "name" : "iphone" }, { "_id" :2, "name" : "iphône" }, { "_id" :3, "name" : "iphonë" }, { "_id" :4, "name" : "iphônë" } ]) > db.test.ensureIndex({name: "text" }) > db.test.find({$text: {$search: "iphone" }}) { "_id" : 1, "name" : "iphone" } { "_id" : 2, "name" : "iphône" } > db.test.find({name: "iphone" }).collation({locale: "en" , strength: 1}) { "_id" : 1, "name" : "iphone" } { "_id" : 2, "name" : "iphône" } { "_id" : 3, "name" : "iphonë" } { "_id" : 4, "name" : "iphônë" }
    • Query 2017-07-31, Query 2017-10-02, Query 2017-10-23, Query 2017-11-13

      $text search is not diacritic insensitive if the word contains a dieresis ( ¨ ). Dieresis is categorized as diacritic in Unicode 8.0 Character Database Prop List, cf http://www.unicode.org/Public/8.0.0/ucd/PropList.txt

      Search with collation works fine with

      strength = 1
      

            Assignee:
            kyle.suarez@mongodb.com Kyle Suarez
            Reporter:
            felix2626 adrien petel
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: