[SERVER-14670] Add expressions to determine the length of a string Created: 24/Jul/14  Updated: 22/Mar/17  Resolved: 29/Mar/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.3.4

Type: Improvement Priority: Major - P3
Reporter: Lukas Benes Assignee: Benjamin Murphy
Resolution: Done Votes: 15
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File length-operator.diff    
Issue Links:
Depends
is depended on by CSHARP-1660 Add expressions to determine the leng... Closed
Documented
is documented by DOCS-8709 Document $strLenBytes/$strLenCP Closed
Duplicate
is duplicated by SERVER-5319 provide strlen expression for $project Closed
Related
is related to DRIVERS-297 Aggregation Framework Support for 3.4 Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 12 (04/04/16)
Participants:

 Description   

Syntax

{$strLenBytes: <expression>}
{$strLenCP: <expression>}  // CP stands for UTF-8 'code points'.

Examples

Input

{_id: 0, string: "cliché"}

Pipeline

db.coll.aggregate([{
    $project: {
        byteLength: {$strLenBytes: "$string"},
        cpLength: {$strLenCP: "$string"}
    }
}])

Output

{_id: 0, byteLength: 7, cpLength: 6}

Original Description

We need to determine string length in aggregation pipeline

eg:

db.kol.insert({"text": "abcde"})
db.kol.insert({"text": "ab"})
 
db.kol.aggregate({ $project: { "text_length": {$length: "$text"  }}})

result:

{ "_id" : ObjectId("53d0c9bdc2644cdc0ab835f5"), "text_length" : 5 }
{ "_id" : ObjectId("53d0c9c1c2644cdc0ab835f6"), "text_length" : 2 }



 Comments   
Comment by Githook User [ 29/Mar/16 ]

Author:

{u'username': u'benjaminmurphy', u'name': u'Benjamin Murphy', u'email': u'benjamin_murphy@me.com'}

Message: SERVER-14670 Aggregation supports strLenCP and strLenBytes.
Branch: master
https://github.com/mongodb/mongo/commit/6bd6589e806c6545906475510e1b385774676489

Comment by Benjamin Murphy [ 29/Mar/16 ]

This patch adds the $strLenCP and $strLenBytes aggregation expressions, with syntax as described in the description, which need to be documented and added to any drivers that support aggregation helpers.

Comment by Charlie Swanson [ 11/Mar/16 ]

After some internal discussion, I've updated the description to reflect the agreed-upon design.

Comment by Asya Kamsky [ 02/Oct/15 ]

To add $length we need to decide if it returns length in UTF8 bytes or Unicode code points.

I think the important issue is that its size should match the way $substr counts since $length can be used in an expression to get values for $substr.

Comment by Bradley Arsenault [ 11/Jun/15 ]

I second a strong desire for a $length operator for strings. It is useful in a variety of circumstances I have encountered. I would find it useful in the following forms:

1) As a query operator to match strings of a specific length or strings greater/less then a specific length.
2) A projection operator so that I can get the length of a string and include it into the next phase of aggregation
3) As a way of sorting, perhaps like a metadata based sort. I would like to return objects sorted by the length of a specific field.

Comment by Asya Kamsky [ 25/Jul/14 ]

I understand the number of characters before '#' is variable, but if that number is bounded (for example less than 20) then you can use the trick I describe here:
http://www.kamsky.org/stupid-tricks-with-mongodb/ugly-way-to-parse-a-string-with-aggregation-framework

Comment by David Moravek [ 25/Jul/14 ]

We need to select six characters from the middle of string with variable length and use them as a group _id. So we ended up with the following $project:

month: $substr: ["$_id", $subtract: [$length: "$_id", 10], 6]

document _id we subtract string from looks like this: 1234567#201410T-02 (length of the number followed by hashmark varies)

it would be awesome if we could simply write something like this:

month: $substr: ["$_id", -10, 6]

Comment by Asya Kamsky [ 25/Jul/14 ]

davidmoravek what is the reason that you need $length (since you mention $substr) - is it to somehow normalize a string value?

Comment by David Moravek [ 24/Jul/14 ]

Hari, the reason we need the $length operator, is that $substr does not support negative offset (because of string::size_type data type). Is there any other way to achieve this? Thanks

Comment by hari.khalsa@10gen.com [ 24/Jul/14 ]

Thanks for the diff, falsecz. I think we're going to postpone this until we can consider the agg projection language in a more holistic fashion.

Generated at Thu Feb 08 03:35:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.