[SERVER-11947] Add a regex expression to the aggregation language Created: 04/Dec/13 Updated: 19/Jun/19 Resolved: 30/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.11 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Norberto Fernando Rocha Leite (Inactive) | Assignee: | Arun Banala |
| Resolution: | Done | Votes: | 42 |
| Labels: | asya, expression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Arun Banala [ 19/Jun/19 ] | |||||||||||||||||||||||||||||||||||
|
Vokail Yes! This feature is currently available in development version 4.1.11 and will be in production release 4.2. You can find an overview of this feature in the user summary box at the top of this ticket. You can find more detailed examples for $regexFind and $regexFindAll in the upcoming release docs section. Thank you for your continued interest in this feature! | |||||||||||||||||||||||||||||||||||
| Comment by Vokail [ 19/Jun/19 ] | |||||||||||||||||||||||||||||||||||
|
I'm not sure to understand: this issue is closed and available in version 4.2 ? | |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 14/Feb/19 ] | |||||||||||||||||||||||||||||||||||
|
brucehappy that edit was made in error - you can disregard it. We are still working on this and while we cannot guarantee that it will make 4.2 we are doing our best. | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 12/Feb/19 ] | |||||||||||||||||||||||||||||||||||
|
For all of us waiting (for years) for this feature to finally be included as part of 4.2, seeing the fix version change to Q3-2019 is more than a little depressing @Asya | |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 29/Jan/19 ] | |||||||||||||||||||||||||||||||||||
|
Starts with and ends with are both doable currently via one of the available string expressions. Example for "f1" ends with "f2":
Note that $expr is available since 3.6 and allows you to use all of string expressions from aggregation. Similarly start with should compare contents of "f2" with substring of "f1" from 0 to "length of f2". | |||||||||||||||||||||||||||||||||||
| Comment by Vokail [ 29/Jan/19 ] | |||||||||||||||||||||||||||||||||||
|
I want also to highlight solution above does not fit with "endswith", because indexOfBytes return only first match | |||||||||||||||||||||||||||||||||||
| Comment by Vokail [ 29/Jan/19 ] | |||||||||||||||||||||||||||||||||||
|
@Asya Kamsky sure, I've provided all information here: https://stackoverflow.com/questions/54365355/mongodb-regex-in-aggregation-using-reference-to-field-value
From my understanding, seems to be the only way is to add a field and use {{indexOfBytes }} | |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 28/Jan/19 ] | |||||||||||||||||||||||||||||||||||
|
Vokail you can already use regular regex find syntax during $match stage. Can you clarify your question/use case? | |||||||||||||||||||||||||||||||||||
| Comment by Vokail [ 24/Jan/19 ] | |||||||||||||||||||||||||||||||||||
|
I'm also looking for this, in particular:
there is a workaround during aggregation and $match operator to use a regex?
| |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 03/Dec/18 ] | |||||||||||||||||||||||||||||||||||
|
This feature is still scheduled to be worked on for 4.2 release however, it's not guaranteed to be in the 4.2 release until this ticket is closed (i.e. the code is committed). | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 03/Dec/18 ] | |||||||||||||||||||||||||||||||||||
|
Could I please get an update on whether this feature is still scheduled for release as part of v4.2? | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 22/Aug/18 ] | |||||||||||||||||||||||||||||||||||
|
@asya Is this issue still on track for 4.1 with stable release in 4.2? | |||||||||||||||||||||||||||||||||||
| Comment by Wendong Wu [ 26/Mar/18 ] | |||||||||||||||||||||||||||||||||||
|
@Asya Kamsky I think this is for languages which don't support the /pattern/ syntax for regex, for example Python. With the $regex syntax, user can specify the following string is a regex pattern. Although you may argue that in Python user can use re.compile("pattern") when writing the query. | |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 24/Mar/18 ] | |||||||||||||||||||||||||||||||||||
|
charlie.swanson is there a reason we need to accept both /pattern/ and {$regex:} syntax for the regex expression? wan.bachtiar pointed out that to accept "$" prefixed document in aggregation we would need to make changes to parser which otherwise would reject this as unrecognized agg expression. I couldn't think of any reason to accept $regex subdocument syntax, rather than just document that /pattern/opt syntax should be used, can you? | |||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 23/Jun/17 ] | |||||||||||||||||||||||||||||||||||
|
Updated description to match our draft syntax proposal/examples | |||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 05/Jun/17 ] | |||||||||||||||||||||||||||||||||||
|
Hi brucehappy, Sorry for the confusion. I looked at your patch and I believe I now understand your desired change. Unfortunately, we cannot accept this change, as the code you changed is the generic implementation of comparing values within the aggregation framework. This will change more than just the behavior of the $eq operator, in ways that are not consistent with elsewhere in the server. The most obvious place is in the sorting semantics. In both the query and aggregation systems, we must be able to sort values of different types, and we do this by putting all regexes after strings (if you trace the calls, you can eventually see that strings have a canonical type code of 10, whereas regexes have a canonical type code of 50, which means regexes will always come after strings. The code for comparisons within the query system is similar, though implemented in bsonelement.cpp).
Changing the sorting behavior in this way doesn't really make sense. Imagine you have a regex r, and strings a, b, and c. Now imagine that a < b < c, but that r matched both a and c. Where should r go in the sort order? Further, although it is subtle, the aggregation system's implementation of $eq is actually consistent with the query system's implementation. I understand and sympathize with the confusion, but the match expression {a: /regex/} is actually different than the match expression {a: {$eq: /regex/}}. The {a: /regex/} syntax is really just a shorthand for {a: {$regex: "regex"}}, which I think is pretty confusing:
The aggregation comparison is actually consistent with this later syntax, which is parsed into an equality match expression, rather than a regex match expression. We don't have an equivalent expression within the aggregation framework, which I think is the most obvious way to gain this functionality without fundamentally changing comparisons between values of different types. With all this context, I think we will need to reject the pull request, and work to provide some sort of $regex expression within the aggregation framework which will give you the functionality you need. We are currently reviewing a draft proposal for this change, which describes the addition of new $regexMatch and $regexSearch expressions. We will provide more details on the desired syntax and behavior once the draft has been approved. Best, | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 26/May/17 ] | |||||||||||||||||||||||||||||||||||
|
Hello charlie.swanson, What I have implemented is not a new operator, but rather a more modest change to alter how the comparison is performed between a String/Symbol type and a RegEx type in $project and $group. I would argue that my change is actually a bug fix for the very surprising existing behavior of having a regex in a $cond or other comparison not actually execute against the string when inside one of these boolean expressions. I can certainly understand that people would want a new/modified operator to do capturing groups and other regex match stuff in a context outside of a boolean expression, and in fact that would help me with some other features I am working on, but to me, the creation of that new operator (as expressed in this JIRA issue, I would suggest reopening Thanks | |||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 26/May/17 ] | |||||||||||||||||||||||||||||||||||
|
Hi brucehappy, Thank you so much for your pull request! We are always careful to give careful thought to the syntax and semantics when adding a new operator to the aggregation language. We want to make sure it is simple to use, has the right semantics that users will not find surprising, and will be genuinely useful. As part of this process, we need to go through some internal review to agree on what we want the syntax and semantics should be. I've taken over this process and will try to speed this discussion along so we can get to your pull request! I would like to warn you that the syntax we agree upon is often different than the simplistic use-case pointed out in the original report (SERVER ticket). In this particular case, we'll at least consider adding support for capture groups ( Thank you, and don't hesitate to ask if you have any questions. | |||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 19/May/17 ] | |||||||||||||||||||||||||||||||||||
|
Hi brucehappy, Thank you for the pull request! We'll review it and provide our comments. Kind regards, | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 17/May/17 ] | |||||||||||||||||||||||||||||||||||
|
I have submitted a PR to implement this feature here:
And with the changes the output looks like this:
Looking forward to working with backlog-server-query to get this integrated. | |||||||||||||||||||||||||||||||||||
| Comment by Bruce Duncan [ 16/May/17 ] | |||||||||||||||||||||||||||||||||||
|
I completely agree with wendwu. Having a regular expression as part of the $project or $group stages in a pipeline would be extremely helpful in doing the sort of bucketing that wendwu demonstrates above, which cannot be accomplished once the regular expression used is complex (cannot be implementing using $split, etc). But it should be noted that the description of this issue is actually requesting something other than regular expression value/string value equality support in the $project and $group stages of the aggregation pipeline. It is requesting support for regular expression match extraction during $project. | |||||||||||||||||||||||||||||||||||
| Comment by Wendong Wu [ 02/May/17 ] | |||||||||||||||||||||||||||||||||||
|
asya Thanks for the suggestion. I believe $indexOfCP and $split can solve most of the pattern matching/searching problems. But I think regex is useful in many cases, especially when the patterns get more complicated than splitting the "_". Though we may still find a way to use nested $or, $and expressions to simulate the logic, it is easier to use $regex if we have it supported. | |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 02/May/17 ] | |||||||||||||||||||||||||||||||||||
|
wendwu you can use $split expression to get the same substring here:
| |||||||||||||||||||||||||||||||||||
| Comment by Wendong Wu [ 02/May/17 ] | |||||||||||||||||||||||||||||||||||
|
This feature is very useful in preprocessing the data and then send it to the $group pipeline afterwards. For example for a collection like this,
We can have the aggregation pipeline go like this to get the categorize the data and calculate the count for each category.
|