Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7218

Turn on PCRE_UCP config option to pcre build to enable some regex characters (\b \B \d etc) to work with UTF8 characters

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Build, Querying
    • None
    • Query Execution
    • Minor Change
    • QE 2021-09-06, QE 2021-09-20, QE 2021-10-04, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-01-24

    Description

      http://www.pcre.org/pcre.txt

      PCRE_UCP

      This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W,
      \w, and some of the POSIX character classes. By default, only ASCII
      characters are recognized, but if PCRE_UCP is set, Unicode properties
      are used instead to classify characters. More details are given in the
      section on generic character types in the pcrepattern page. If you set
      PCRE_UCP, matching one of the items it affects takes much longer. The
      option is available only if PCRE has been compiled with Unicode prop-
      erty support.

      Without this option characters that match word boundary (\b for example) do not behave correctly when the word starts with a UTF8 character.

      Adapted from https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/owqLT6b-weE

      so@local(2.2.0) > db.subjects.find( { labelfr: /colo/ })
      { "_id" : ObjectId("5069baa4b049b18f5c52d1ac"), "labelfr" : "Écologie" }
      { "_id" : ObjectId("5069bb78b049b18f5c52d1ae"), "labelfr" : "word Écologie" }
      { "_id" : ObjectId("5069bcd7b049b18f5c52d1af"), "labelfr" : "word ecologie" }
      Fetched 3 record(s) in 5ms
      

      but

      so@local(2.2.0) > db.subjects.find( { labelfr: /\bcolo/ })
      { "_id" : ObjectId("5069baa4b049b18f5c52d1ac"), "labelfr" : "Écologie" }
      { "_id" : ObjectId("5069bb78b049b18f5c52d1ae"), "labelfr" : "word Écologie" }
      Fetched 2 record(s) in 13ms
      so@local(2.2.0) > db.subjects.find( { labelfr: /\Bcolo/ })
      { "_id" : ObjectId("5069bcd7b049b18f5c52d1af"), "labelfr" : "word ecologie" }
      Fetched 1 record(s) in 9ms
      so@local(2.2.0) > db.subjects.find( { labelfr: /\BÉcolo/ })
      { "_id" : ObjectId("5069baa4b049b18f5c52d1ac"), "labelfr" : "Écologie" }
      { "_id" : ObjectId("5069bb78b049b18f5c52d1ae"), "labelfr" : "word Écologie" }
      Fetched 2 record(s) in 9ms
      so@local(2.2.0) > db.subjects.find( { labelfr: /\bÉcolo/ })
      Fetched 0 record(s) in 6ms
      

      Attachments

        Activity

          People

            backlog-query-execution Backlog - Query Execution
            asya.kamsky@mongodb.com Asya Kamsky
            Votes:
            9 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated: