-
Type: New Feature
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Linq
-
None
Syntax
{$split: [<expression>, <expression>]}
Examples
> db.coll.insert([ {_id: 1, string: "abracadabra"} ]); > db.coll.aggregate([{ $project: { split: {$split: ["$string", "a"]} } }]); {_id: 1, split: ["", "br", "c", "d", "br", ""]} // Example 2 > db.coll.insert([ {_id: 1, string: "zero one zero zero one zero one"} ]); > db.coll.aggregate([{ $project: { split: {$split: ["$string", "one"]} } }]); {_id: 1, split: ["zero ", " zero zero ", " zero ", ""]} // Example 3 > db.coll.insert([ {_id: 1, string: "hello world"} ]); > db.coll.aggregate([{ $project: { split: {$split: ["$string", "notInTheString"]} } }]); {_id: 1, split: ["hello world"]}
Notes
- Identical functionality as Python's split(), including the empty string being output if the pattern lies at the beginning or end of the string.
Errors
- If either input expression does not evaluate to a string.
Old Description
It would be nice to have more operators for dealing with strings in the aggregation framework. A very useful operator would be a simple split to be used in the $project part of the pipeline.
If I could write something like:
{$project : {words : {$split : "$some_textual_field"}}
that would result in the string contained in the given field to be turned into a word list (based on whitespaces and punctuation signs, or maybe a regexp defining where to split).
This would be tremendously useful for natural language processing and similar tasks.
- depends on
-
SERVER-6773 Aggregation operator $split for splitting string based on a separator
- Closed