[DOCS-9561] Docs for SERVER-6773: Aggregation operator $split for splitting string based on a separator Created: 05/Dec/16  Updated: 11/Jan/17  Resolved: 05/Dec/16

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: 01112017-cleanup

Type: Task Priority: Major - P3
Reporter: Emily Hall Assignee: Jonathan DeStefano
Resolution: Duplicate Votes: 0
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-6773 Aggregation operator $split for split... Closed
Duplicate
duplicates DOCS-8696 Document $split aggregation operator Closed
Participants:
Days since reply: 7 years, 10 weeks, 2 days ago
Epic Link: PM-507

 Description   

Engineering Ticket Description:

Syntax

{$split: [<expression>, <expression>]}

Examples

> db.coll.insert([
  {_id: 1, string: "abracadabra"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "a"]}
  }
}]);
{_id: 1, split: ["", "br", "c", "d", "br", ""]}
 
// Example 2
> db.coll.insert([
  {_id: 1, string: "zero one zero zero one zero one"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "one"]}
  }
}]);
{_id: 1, split: ["zero ", " zero zero ", " zero ", ""]}
 
 
// Example 3
> db.coll.insert([
  {_id: 1, string: "hello world"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "notInTheString"]}
  }
}]);
{_id: 1, split: ["hello world"]}

Notes

  • Identical functionality as Python's split(), including the empty string being output if the pattern lies at the beginning or end of the string.

Errors

  • If either input expression does not evaluate to a string.

Old Description
It would be nice to have more operators for dealing with strings in the aggregation framework. A very useful operator would be a simple split to be used in the $project part of the pipeline.

If I could write something like:

{$project : {words : {$split : "$some_textual_field"}}

that would result in the string contained in the given field to be turned into a word list (based on whitespaces and punctuation signs, or maybe a regexp defining where to split).

This would be tremendously useful for natural language processing and similar tasks.


Generated at Thu Feb 08 07:58:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.