allow regex word character (\w) and word boundary (\b) escapes to be unicode-aware

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.0.11
    • Component/s: Querying
    • None
    • Query Optimization
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Provide a way to use regular expressions in MongoDB where the word character (\w) and word boundary (\b) escapes work for code points greater than or equal to 256.

      Original description

      $regex word boundary fails by treating Danish ø character as a non-character

      db.collection.find({ "name" : { "$regex" : ".*\\bden\\b.*" , "$options" : "i"} })
      

      returns a document:

      {  "name": "Death Is A Caress(Døden Er Et Kjærtegn).sub" }
      

            Assignee:
            [DO NOT USE] Backlog - Query Optimization
            Reporter:
            Nic Cottrell (Personal) (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: