[SERVER-32356] Use of options:x and comments with $regex search including \n can lead to incorrect documents being returned Created: 14/Dec/17 Updated: 27/Oct/23 Resolved: 29/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Byrne III | Assignee: | David Storch |
| Resolution: | Works as Designed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
|||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Pass the following in a file to the mongo shell:
Summary of output:
|
|||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-01-15 | |||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
With a simple set of documents:
these two $regex expressions (identical except for use of quotes or slashes):
give different results, both incorrect, despite our $regex documentation suggesting that the syntaxes with the search strings wrapped in quotes or slashes are equivalent. Removing the option:x, the comment and the space leads to correct results from both forms. |
| Comments |
| Comment by David Storch [ 29/Dec/17 ] | |||||||||||||||||||||||||
|
Hi william.byrne, After digging into the details, I actually believe that this is working as designed. There's a lot going on here, so let me unpack some of the details by focusing on two queries highlighted in the ticket description:
First queryLet's start by considering the first problem query:
If I understand correctly, you claim that this is the incorrect result set. Instead, you would expect the following:
In other words, you expect the newline character to be significant, but instead it is ignored. To the contrary, my understanding is that we would expect the "\n" character to be ignored here. The PCRE "x" option turns on "extended mode". Quoting from the documentation for PCRE_EXTENDED:
The "\n" is neither escaped nor inside a character class, so it seems correct that it is not considered part of the search pattern. Second queryWhen I run this query against a recent version of 3.4, I get the following:
However, when I run the query against 3.6.0, it fails:
This is due to the changes made in This begs a few questions. What does the error message "missing )" mean? Why is the regex valid when specified with quotes but invalid when specified with slashes? The "missing )" error message I can't explain: this is fairly cryptic, but it's just the error string that we have surfaced from the underlying PCRE compilation of the regex. Regarding the latter question: the difference in behavior between quotes versus slashes has to do with how the "\n" sequence is interpreted and the semantics of the "#" character when PCRE_EXTENDED is enabled. When using the syntax {$regex: "\n"} in the shell, the server receives 0x0a, the ASCII newline character. On the other hand, when using the syntax {$regex: /\n/}, the server receives 0x5c 0x6e, the ASCII codes for "\" and "n". I believe this is just JavaScript behavior, rather than something MongoDB-specific. In any case, this encoding difference is produced by the shell, not the server. PCRE_EXTENDED causes the "#" character to be interpreted specially as a comment. Quoting again from the PCRE manual:
There are two interesting facts here. The first is that a newline character is used together with "#" in order to terminate a comment. It appears that PCRE requires this terminator, and the regular expression is considered invalid without one:
Second, only the character 0x0a can terminate a comment; the sequence 0x5c 0x6e does not cut it. Putting everything together, we see that this query correctly fails because the comment is not terminated. ConclusionI believe the above should also explain the behavior of the remaining queries in the repro script. I am closing this ticket as Works as Designed and linking as related to Best, |