Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-9561

Docs for SERVER-6773: Aggregation operator $split for splitting string based on a separator

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 01112017-cleanup
    • Component/s: None
    • Labels:

      Description

      Engineering Ticket Description:

      Syntax

      {$split: [<expression>, <expression>]}
      

      Examples

      > db.coll.insert([
        {_id: 1, string: "abracadabra"}
      ]);
      > db.coll.aggregate([{
        $project: {
          split: {$split: ["$string", "a"]}
        }
      }]);
      {_id: 1, split: ["", "br", "c", "d", "br", ""]}
       
      // Example 2
      > db.coll.insert([
        {_id: 1, string: "zero one zero zero one zero one"}
      ]);
      > db.coll.aggregate([{
        $project: {
          split: {$split: ["$string", "one"]}
        }
      }]);
      {_id: 1, split: ["zero ", " zero zero ", " zero ", ""]}
       
       
      // Example 3
      > db.coll.insert([
        {_id: 1, string: "hello world"}
      ]);
      > db.coll.aggregate([{
        $project: {
          split: {$split: ["$string", "notInTheString"]}
        }
      }]);
      {_id: 1, split: ["hello world"]}
      

      Notes

      • Identical functionality as Python's split(), including the empty string being output if the pattern lies at the beginning or end of the string.

      Errors

      • If either input expression does not evaluate to a string.

      Old Description
      It would be nice to have more operators for dealing with strings in the aggregation framework. A very useful operator would be a simple split to be used in the $project part of the pipeline.

      If I could write something like:

      {$project : {words : {$split : "$some_textual_field"}}

      that would result in the string contained in the given field to be turned into a word list (based on whitespaces and punctuation signs, or maybe a regexp defining where to split).

      This would be tremendously useful for natural language processing and similar tasks.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jonathan.destefano Jonathan DeStefano
              Reporter:
              emily.hall Emily Hall
              Participants:
              Last commenter:
              Jonathan Dahl Jonathan Dahl
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Days since reply:
                5 years, 8 weeks ago