[SERVER-67715] Change stream reader requires double escaping regexes Created: 30/Jun/22 Updated: 29/Oct/23 Resolved: 19/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Change streams |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.3, 6.1.0-rc3, 6.2.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Vishnu Kaushik | Assignee: | Kyle Suarez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v6.1, v6.0
|
||||||||||||
| Sprint: | QE 2022-09-19, QE 2022-10-03 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We can open a change stream to watch events from non-system collections in a database, by using the regular expression "^system\." - starts with "system", and then an escaped .. All code below run on mongo shell connecting to single node replica set.
However, this is NOT returning events from collections that start with "system" but are not actually system collections. Strangely, the same regex when used in a list collections (or aggregate with $listCatalog) to show all non-system collections will work correctly ("system_js" is printed):
In the change stream case, it seems like we have to escape the backslash as well and use "^system
See comments for more info. |
| Comments |
| Comment by Githook User [ 30/Sep/22 ] | |||||||||||
|
Author: {'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}Message: (cherry picked from commit c9fe899fff347770c0e30fa0272f6157be6676a8) | |||||||||||
| Comment by Githook User [ 19/Sep/22 ] | |||||||||||
|
Author: {'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}Message: (cherry picked from commit c9fe899fff347770c0e30fa0272f6157be6676a8) | |||||||||||
| Comment by Githook User [ 19/Sep/22 ] | |||||||||||
|
Author: {'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}Message: | |||||||||||
| Comment by Kyle Suarez [ 13/Sep/22 ] | |||||||||||
|
britt.snyman@mongodb.com, this is a query correctness bug and I think we should keep this as a release blocker. CC bernard.gorman@mongodb.com | |||||||||||
| Comment by Wenbin Zhu [ 06/Jul/22 ] | |||||||||||
|
Yeah I think 6.0.1 is fine. | |||||||||||
| Comment by Wenbin Zhu [ 01/Jul/22 ] | |||||||||||
|
bernard.gorman@mongodb.com Yes we initially wanted to support replicating system.js, but recently due to some limitations (one of them being the privilege needed to create/drop system collections) we decided to not support replicating any system collections for GA, but I think after GA, we are going to add support for that. | |||||||||||
| Comment by Bernard Gorman [ 01/Jul/22 ] | |||||||||||
But {showSystemEvents:true} should only be reporting events on system.js, which C2C specifically requested in the original scope? | |||||||||||
| Comment by Wenbin Zhu [ 01/Jul/22 ] | |||||||||||
C2C needs to add this filter because we use showSystemEvents in order to get create/createIndexes events due to chunk migration, which also generates events from system collections that we need to exclude. | |||||||||||
| Comment by Vishnu Kaushik [ 01/Jul/22 ] | |||||||||||
|
Yes, sorry bernard.gorman@mongodb.com, that is a typo (I've fixed it now) - when using $nin, "system_js" is NOT showing up with the regex /^system\. though it should. It will show up if we use the regex
. | |||||||||||
| Comment by Bernard Gorman [ 01/Jul/22 ] | |||||||||||
vishnu.kaushik@mongodb.com, did you mean that this is NOT returning events from collections like system_js? I believe this is due to how we rewrite the $match into a filter on the oplog. If I look at the explain output for the $changeStream pipeline with this filter, I see the following:
Looks like the escaped period is being resolved to a literal period before being applied in the regex, causing it to match anything that starts with system and has at least one additional character after it. (As an aside, it's worth noting that this filter isn't actually necessary, since change streams by default does not return any events on system collections). | |||||||||||
| Comment by Vishnu Kaushik [ 30/Jun/22 ] | |||||||||||
|
Ok, I verified that it happens on the 6.0 binary as well, commit hash 952ed79880ec280dce20c95ce3b178036d366771. | |||||||||||
| Comment by Jennifer Peshansky (Inactive) [ 30/Jun/22 ] | |||||||||||
|
From a glance, the upgrade isn't involved in and of itself, since the regex works correctly in some situations in the shell but not in others. It seems to have to do with how the change stream code parses slashes? | |||||||||||
| Comment by Kyle Suarez [ 30/Jun/22 ] | |||||||||||
|
jennifer.peshansky@mongodb.com do you think that the PCRE2 Upgrade is potentially involved here? | |||||||||||
| Comment by Vishnu Kaushik [ 30/Jun/22 ] | |||||||||||
|
I was running this locally with FCV 6.0, but the binary version is master. | |||||||||||
| Comment by Kyle Suarez [ 30/Jun/22 ] | |||||||||||
|
vishnu.kaushik@mongodb.com what version was this run on? Master or 6.0? |