[SERVER-60100] Estimate cost of migrating to PCRE2 Created: 20/Sep/21  Updated: 13/Oct/21  Resolved: 13/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kyle Suarez Assignee: Jennifer Peshansky (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-48232 PCRE is not maintained anymore, upgra... Closed
Sprint: QE 2021-10-04, QE 2021-10-18
Participants:

 Description   

This task is to do some exploratory work to help determine the size of the PCRE2 upgrade project. As part of this work, we should:

  • Document which PCRE calls currently exist in the server codebase
  • Compare with the new PCRE2 API and determine how difficult it would be to change over
  • Attempt to add PCRE2 as a server dependency and try to compile it.

The output of this ticket should be some guidance as to how many engineers and how long it would take to perform the full PCRE2 upgrade.



 Comments   
Comment by Jennifer Peshansky (Inactive) [ 08/Oct/21 ]

Locally, I've successfully been able to compile the pcre2 dependency and include #include <pcre2.h> everywhere we include pcre.h. The only caveat is that with pcre2 we need to define the macro PCRE2_CODE_UNIT_WIDTH before we include the header file.

pcrecpp.h is another matter. pcre2 doesn't have an official cpp library, as noted by SERVER-48232, so we would have to use an external library like jpcre2 or pcre2pp. This is a second dependency on top of just pcre2. I'll check if each of these compile.

Comment by Jennifer Peshansky (Inactive) [ 28/Sep/21 ]

I believe this project should take about 1-2 sprints for all of the work. Most of the functionality we use from PCRE has rough equivalents in PCRE2. I've made a document with details.

The main change we will have to incorporate is that the function for matching patterns now stores its results in a match_data block, which must be processed using API functions, rather than just returning a vector of offsets to substrings it has matched. Additionally, the process for freeing memory is different; there are specific functions for certain types of data, rather than just a generic pcre_free function. Other than that, the rest of the API we use has pretty exact equivalents in PCRE2.

Generated at Thu Feb 08 05:48:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.