Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22580

Add $cpLength and $cpSubstr expressions which work via code points

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.4
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Query 12 (04/04/16)
    • Linked BF Score:
      0

      Description

      Syntax

      {$substrBytes: [ <string>, <expression>, <expression>] }
      {$substrCP: [ <string>, <expression>, <expression>] }
      

      Examples

      Input

      {_id: 0, string: "ελληνικά"}
      

      Pipeline

      db.coll.aggregate([{
          $project: {
              byteSubstr: {$substrBytes: ["$string", 0, 4]},
              cpSubstr: {$substrCP: ["$string", 0, 4]}
          }
      }])
      

      Output

      {_id: 0, byteSubstr: "ελ", cpSubstr: "ελλη"}
      

      Additional Notes

      • Will not add any new query functionality to work with strings.
      • $substrBytes will error if it starts or ends in the middle of a code point.
      • $substrCP will error on any input that is detected to be invalid UTF-8.

      Original Description

      The current expression $substr, and the proposed expression $length (see SERVER-14670) will work in terms of bytes in the string. Sometimes it is desirable to work in terms of code points instead, so we should add the equivalent expressions that will work with code points.

      For example, {$substr: ["\uD834\uDF06", 0, 1]} would be an error (since the second is a continuation byte), but {$cpSubstr: ["\uD834\uDF06", 0, 1]} would be "\uD834\uDF06".

      Correspondingly, {$length: "\uD834\uDF06"} would be 2, but {$cpLength: "\uD834\uDF06"} would be 1.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              benjamin.murphy Benjamin Murphy
              Reporter:
              charlie.swanson Charlie Swanson
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: