[SERVER-6801] aggregation $substr expression can output invalid UTF8 Created: 20/Aug/12 Updated: 28/Oct/15 Resolved: 22/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aaron Staple | Assignee: | Charlie Swanson |
| Resolution: | Done | Votes: | 1 |
| Labels: | UT, neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Major Change |
| Operating System: | ALL |
| Sprint: | Quint Iteration 4 |
| Participants: |
| Description |
|
$substr will be changed to error out if splitting in the middle of a multi-byte code point. In particular, it will error out if the first byte is a continuation byte or the last byte is not either a single byte code point or the final byte of a multi-byte code point. The implementation may assume that the input string is valid uft8. Original Title: aggregation string functions are not encoding (utf8) aware Original Description: We might want to prevent the aggregation framework from producing invalid utf8. Potentially we could make $substr operate on utf8 characters rather than bytes. Test
Output
|
| Comments |
| Comment by Githook User [ 22/May/15 ] |
|
Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}Message: |
| Comment by Mathias Stearn [ 01/Apr/13 ] |
|
This should be easy for a neweng once we add some standard UTF8 helper functions. |
| Comment by auto [ 20/Aug/12 ] |
|
Author: {u'date': u'2012-08-20T13:48:53-07:00', u'email': u'aaron@10gen.com', u'name': u'astaple'}Message: |