Loading...

Type: New Feature
Resolution: Done
Priority: Major - P3
Fix Version/s: 4.1.11
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
- asya
- expression

Backwards Compatibility:
Fully Compatible
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of May 10, 2019

FEATURE DESCRIPTION
This feature adds three new expressions $regexFind, $regexFindAll and $regexMatch to the aggregation language. The $regexFind and $regexFindAll expressions allows regex matching and capturing. $regexMatch is a syntactic sugar on top of $regexFind which can be used for regex matching.

VERSIONS
This feature is available in the 4.1.11 and newer development versions of MongoDB, and in the 4.2 and newer production releases.

RATIONALE
Regex search is a powerful feature of the match language, but does not exist within the aggregation framework. This would unlock many use cases of string manipulation, and bring the two languages closer together. MongoDB Stitch would also be able to leverage this expression to allow users to define visibility rules using regular expressions.

OPERATION

Syntax

Input

{$regexFind:{             // returns the first match found
    input: <expression>,
    regex: <expression>,
    options: <expression> // optional
}}

{$regexFindAll:{          // returns every match
    input: <expression>,
    regex: <expression>,
    options: <expression> // optional
}}

{$regexMatch:{          // returns true/false
    input: <expression>,
    regex: <expression>,
    options: <expression> // optional
}}

input: string, or expression evaluating to a string
regex: /pattern/opts, or "string pattern", or expression resolving to a regex type. Does not support the extended json regex syntax of {$regex: <string>, $options: <options>}.
options: “imsx”, or expression resolving to a string

Note that this syntax is different from the syntax used to specify regexes and options elsewhere in the server. The $regex match expression may take the form {$regex: <pattern>, $options: <options>}. The important difference is that we are hoisting the ‘regex’ and ‘options’ field into the top-level object. This lets us avoid repeating “regex” twice, (e.g. {input: “x”, regex: {$regex: “xyz”, $options: “123”)}}. Here are some examples:

{$regexFind: {input:"$text", regex: /pattern/opts}
{$regexMatch: {input:"hello world", regex: "$pathToRegexField"}}
{$regexFindAll: {input:"$text", regex: "pattern", options: “mi”}}

options includes all the regex options currently supported in the match language:
'i' - case insensitive
'm' - newlines match ^ and $
'x' - extended mode (allows for comments, ignores whitespace in the regex, etc.)
's' - allows . to include newline characters

Output

$regexFind will return a single document with the format below, for the leftmost substring in input which matches the regex. If no such substring exists, it will return null. $regexFindAll will return an array of documents (one for each substring in input which matches the regex), each of which have the same format as below. If no matches are found, an empty array will be returned.

`$regexFind`

{
   match: <string>
   captures: [<string>, <string>, ...]
   idx: <non-negative integer>
}

`$regexFindAll`

[{
   match: <string>
   captures: [<string>, <string>, ...]
   idx: <non-negative integer>
}, ...]

match: the string that the pattern matched.
captures: an array of substrings within the match captured by parenthesis in the regex pattern, ordered by appearance of the parentheses from left to right. This is an empty array if there were no captures.
idx: a zero-based index indicating where the first char of the match appears in the text field being searched. Represents a code point (not a byte offset).

We will also provide an alias for checking whether any substring matches a regex $regexMatch

$regexMatch is sugar for

{$ne: [ {$regexFind: { <arguments> } }, null ] }

This expression won’t be collation aware, so string comparisons implied by the regex will not match the collation (for example if a collection has a case-insensitive collation, the regex will not “automatically” perform a case-insensitive comparison).

Examples

Basic search with captures

Collection

{_id: 0, text:"Simple example"}

Pipeline

db.coll.aggregate([{
    $project: {
        matches: {
            $regexFindAll: {
                input: "$text",
                regex: “(m(p))”,
            }
        }
    }
}])

Output

{
    _id: 0,
    matches: [
        {
            match: "mp",
            captures: ["mp", "p"],
            idx: 2
        },
        {
            match: "mp",
            captures: ["mp", "p"],
            idx: 10
        }
    ]
}

Email extraction

Collection

{_id: 0,  text:"Some field text with email norberto@mongodb.com"}

Pipeline

db.coll.aggregate([{
    $project: {
        match: {
            $regexFind: {
                input: "$text",
                regex: /([a-zA-Z0-9._-]+)@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+/
            }
        }
    }
}])

Output

{
    _id: 0,
    match: {
        match: "norberto@mongodb.com",
        captures: ["norberto"],
        idx: 27  
    }
}

No matches ($regexFind)

Collection

{_id: 0,  text: "Some text with no matches"}

Pipeline

db.coll.aggregate([{
    $project: {
        match: {
            $regexFind: {
                input: "$text",
                regex:/not present/
            }
        }
    }
}])

Output

{_id: 0, match: null}

No matches ($regexFindAll)

Collection

{_id: 0,  text: "Some text with no matches"}

Pipeline

db.coll.aggregate([{
    $project: {
        matches: {
            $regexFindAll: {
                input: "$text",
                regex:/not present/
            }
        }
    }
}])

Output

{_id: 0, matches: []}

Using regex stored in the document

Collection

{_id: 0, text: "text with 02 digits", regexField: /[0-9]+/}

Pipeline

db.coll.aggregate([{
    $project: {
        match: {
            $regexFind: {
                input: "$text",
                regex: "$regexField",
            }
        }
    }
}])

Output

{_id: 0, match: {match: "02", captures: [], idx: 10}}

Using $regexMatch in a $cond

Collection

{_id: 0, phoneNumber: "212-456-7890"}
{_id: 1, phoneNumber: "1-800-212-000"}

Pipeline

db.coll.aggregate([{
    $project: {
        region: {
            $cond: {
                if: {
                    $regexMatch: {
                        input: “$phoneNumber”,
                        regex: “^212.*$”,
                    }               
                }
                 then: "New York",
                else: "Somewhere Else"
            }
        }
    }
}])

Output

{_id: 0, region: “New York”}
{_id: 1, region: “Somewhere Else”}

Non-overlapping captures

Input

{_id: 0, text:"aaaaa"}

Pipeline

db.coll.aggregate([{
    $project: {
        matches: {
            $regexFindAll: {
                input: "$text",
                regex: “(a*)”,
            }
        }
    }
}])

Output

{
    _id: 0,
    matches: [
        {
            match: "aaaaa",
            captures: [“aaaaa”],
            idx: 0
        },
    ]
}

The purpose of the above example is to show that after a capture is found the search for the next capture will start at the end of the last one (e.g. instead of returning a capture for “a”, “aa”, “aaa” a single capture for “aaaaa” is returned). This matches the behavior provided by python, javascript and other languages. If other behavior is required, the non-greedy ? operator can be used, e.g. /(a+?)/.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

SERVER-11947_fix.js
3 kB
May 17 2017 11:49:45 PM UTC

is duplicated by

SERVER-13902 Reverse regex functionality for queries

Closed

SERVER-32470 Support for $regex operator in $filter of aggregation pipeline.

Closed

SERVER-34122 Pattern Search Support in $filter & $cond operator

Closed

SERVER-9159 Use Regex capture groups with projections

Closed

SERVER-8892 Use $regex as the expression in a $cond

Closed

is related to

SERVER-36261 Support field projection based on string inside of field name

Backlog

SERVER-33389 aggregate function similar to regex replace

Closed

related to

SERVER-39694 Implement $regexMatch as syntactic sugar on top of $regexFind

Closed

SERVER-9156 Projection by a substring match

Closed

SERVER-13902 Reverse regex functionality for queries

Closed

SERVER-22104 $instr function to locate position of a "pattern" within a "string"

Closed

SERVER-8951 Add $findChar or $indexOf operator for strings to find position of specific character (or substring)

Closed

SERVER-39695 Implement $regexFind

Closed

SERVER-39696 Implement $regexFindAll

Closed

links to

Pull Request #1150

(2 is related to, 7 related to, 1 links to)

Details

Description

Syntax

Input

Output

$regexFind

$regexFindAll

Examples

Basic search with captures

Collection

Pipeline

Output

Email extraction

Collection

Pipeline

Output

No matches ($regexFind)

Collection

Pipeline

Output

No matches ($regexFindAll)

Collection

Pipeline

Output

Using regex stored in the document

Collection

Pipeline

Output

Using $regexMatch in a $cond

Collection

Pipeline

Output

Non-overlapping captures

Input

Pipeline

Output

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty

`$regexFind`

`$regexFindAll`