Investigate changes in SERVER-86326: Increase max regex pattern length to 32k

    • Type: Investigation
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Tools and Replicator
    • 4

      Original Downstream Change Summary

      This change increases the maximum length of patterns usable in the $regex operator from 16384 to 32764 bytes. This pattern length limit is now the maximum possible value that is supported by the PCRE2 library.

      This is a bugfix to restore compatibility with MongoDB versions before 6.1.
      In MongoDB version 6.1, the PCRE regex library was upgraded from version 1 to version 2, and an artificial pattern length limit of 16384 bytes was introduced. This caused a regression because previously working queries with long regex patterns stopped working in v6.1.

      The effectively usable pattern length may be less than 32764 bytes, because the PCRE2 regex library can impose further limitations on the pattern.
      Also note that the maximum pattern length is in bytes, not characters. This difference is important for patterns that contains Unicode characters, each of which can consist of multiple bytes.

      Description of Linked Ticket

      After upgrading our database several of our queries started failing with the error message

      "Regular expression is invalid: pattern string is longer than the limit set by the application"

       

      I've traced the relevant code to these following lines:

       

      https://github.com/mongodb/mongo/blob/739b0b5f8e53f09e916a57d4b45b8f7dbbddb211/src/mongo/util/pcre.cpp#L130

      https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_compile.c#L10226C24-L10230

       

      So I am fairly confident this problem was introduced with the following commit:

      https://github.com/mongodb/mongo/commit/468f41278b6b30aa602e81010cf7ef7973d97e4d

       

      An option to change this limit via a config option would solve this regression for us.
       

            Assignee:
            Unassigned
            Reporter:
            Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: