[SERVER-6773] Aggregation operator $split for splitting string based on a separator Created: 15/Aug/12  Updated: 03/May/17  Resolved: 26/Apr/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.3.6

Type: New Feature Priority: Minor - P4
Reporter: Rafael Calsaverini Assignee: Benjamin Murphy
Resolution: Done Votes: 16
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by CSHARP-1650 Operator for splitting string based o... Closed
Documented
is documented by DOCS-9561 Docs for SERVER-6773: Aggregation ope... Closed
is documented by DOCS-8696 Document $split aggregation operator Closed
Related
is related to DRIVERS-297 Aggregation Framework Support for 3.4 Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 12 (04/04/16), Query 13 (04/22/16), Query 14 (05/13/16)
Participants:

 Description   

Syntax

{$split: [<expression>, <expression>]}

Examples

> db.coll.insert([
  {_id: 1, string: "abracadabra"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "a"]}
  }
}]);
{_id: 1, split: ["", "br", "c", "d", "br", ""]}
 
// Example 2
> db.coll.insert([
  {_id: 1, string: "zero one zero zero one zero one"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "one"]}
  }
}]);
{_id: 1, split: ["zero ", " zero zero ", " zero ", ""]}
 
 
// Example 3
> db.coll.insert([
  {_id: 1, string: "hello world"}
]);
> db.coll.aggregate([{
  $project: {
    split: {$split: ["$string", "notInTheString"]}
  }
}]);
{_id: 1, split: ["hello world"]}

Notes

  • Identical functionality as Python's split(), including the empty string being output if the pattern lies at the beginning or end of the string.

Errors

  • If either input expression does not evaluate to a string.

Old Description
It would be nice to have more operators for dealing with strings in the aggregation framework. A very useful operator would be a simple split to be used in the $project part of the pipeline.

If I could write something like:

{$project : {words : {$split : "$some_textual_field"}}

that would result in the string contained in the given field to be turned into a word list (based on whitespaces and punctuation signs, or maybe a regexp defining where to split).

This would be tremendously useful for natural language processing and similar tasks.



 Comments   
Comment by Benjamin Murphy [ 26/Apr/16 ]

This ticket introduces the $split expression to aggregation, with semantics as described above. It will need to be documented as an aggregation expression, and any driver that supports helpers for aggregation will need to include support for it.

Comment by Githook User [ 26/Apr/16 ]

Author:

{u'username': u'benjaminmurphy', u'name': u'Benjamin Murphy', u'email': u'benjamin_murphy@me.com'}

Message: SERVER-6773 Aggregation now supports the split expression.
Branch: master
https://github.com/mongodb/mongo/commit/5c3e0d4855415cbab4bd75732208f832c11f4889

Comment by Nicholas Johnson [ 17/Jul/15 ]

db.collection.aggregate({$project: {tags: {$split : "$tags"}}})

This would be tremendous.

Generated at Thu Feb 08 03:12:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.