Remove trailing slashes from $function regex handling in query_tester-1

XMLWordPrintableJSON

    • Query Integration
    • Fully Compatible
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      The query-correctness-tests-4 reference files contain $regularExpression pattern values with a spurious trailing forward slash — e.g. "pattern":"methodologies|Advanced|extend|SMS/" — wherever the result was produced by a $function aggregation expression returning a regex literal. This trailing slash is not part of the regex pattern; it is an artifact of a long-standing bug in ValueWriter::toRegEx() in the MozJS scripting engine, which was the engine used to generate the reference files.

      Root Cause

      toRegEx() reconstructs a BSON regex from a SpiderMonkey regex object by calling toString() (which returns "/pattern/flags") and then splitting on the last /:

      return JSRegEx(regexStr.substr(1, regexStr.rfind('/')),   // BUG: count == position
                     regexStr.substr(regexStr.rfind('/') + 1)); 

      std::string::substr(pos, count) takes a length, not an end position. Because rfind('/') returns the index of the closing delimiter, passing it as the count yields a substring one character too long — it includes the closing delimiter itself. For /SMS/ this produces pattern SMS/ instead of SMS.

      This is the exact defect tracked in SERVER-98936 ("Fix encoding of regex patterns returned from $function"), which was closed Won't Fix in January 2025, and is the underlying cause reported in SERVER-98945 ("In-query functions incorrectly process regexes nested in arrays"), where the same \/-escaping ambiguity surfaces in the input direction.{}

      The reference files were generated against the MozJS engine while the bug was active. Every $regularExpression value that originated from a $function return went through toRegEx() and therefore has an extra trailing / in its "pattern" field. No collection data is affected — the .coll files contain no regex fields; all $regularExpression values in results come exclusively from $function computations.

      Fix

      1. Fix ValueWriter::toRegEx() by changing rfind('/') to rfind('/') - 1 so only the pattern body is captured.
      2. Fix ValueWriter::_writeObject() to use JS::GetRegExpSource()  JS::GetRegExpFlags() directly, avoiding the toString()  rfind round-trip entirely — which eliminates ambiguity when the pattern itself contains \/.
      3. Update the 154 affected .results files in query-correctness-tests-4 to remove the trailing / from each "pattern" field.
      4. Bump query-correctness-tests-4 commit hash in test_repos.conf.

      Related tickets

      • SERVER-98936 — original bug report for the trailing-slash artifact in $function regex returns (closed Won't Fix)
      • SERVER-98945 — adjacent manifestation of the same \/ escaping ambiguity for regexes in arrays passed to $function
      • SERVER-116052 — WASM JS engine work, where the cross-engine result discrepancy exposed the bug

            Assignee:
            Calvin Nguyen
            Reporter:
            Calvin Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: