[SERVER-6086] Unicode/UTF-8 in the shell needs to handle zero-width and double-width characters Created: 13/Jun/12  Updated: 31/Jul/15  Resolved: 30/Jul/14

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: 2.1.1
Fix Version/s: 2.7.5

Type: Bug Priority: Major - P3
Reporter: Tad Marshall Assignee: Benety Goh
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image.png    
Issue Links:
Depends
is depended on by SERVER-14931 linenoise incorrectly duplicates line... Closed
Duplicate
is duplicated by SERVER-14399 linenoise handles backspace incorrect... Closed
Related
related to SERVER-14741 linenoise does not erase characters a... Backlog
related to SERVER-2939 Support Unicode fully in the Mongo sh... Closed
Tested
Operating System: ALL
Sprint: Server 2.7.5
Participants:

 Description   

Combining characters in Unicode are composed of two adjacent Unicode code units but occupy a single position on the screen. The second code unit of these pairs needs to be treated as a zero-width character so that cursor positioning matches displayed text properly.

Chinese, Japanese and Korean (CJK) characters will typically occupy two screen positions although they are encoded as a single Unicode code unit. These characters need to be treated as double-width characters so that cursor position matches displayed text properly.

To reproduce:

Open mongo shell.
Insert multi-byte sequence as a find document such as:

> db.item.find({'type':"한글블라블라블라"})

Move the cursor to a few characters before the terminating quotation and hit backspace.

The terminating quotation will disappear.



 Comments   
Comment by Jianfa Tang (Inactive) [ 31/Jul/14 ]

The cursor navigation works now. When I move the cursor under the chinese characters, the entire character is underlined(as opposed to be half). And press the arrow key once does move one char over.

Looking at the screenshot(see attachment image.png), you can notice there is some cosmetic issue. I tried to hit backspace 3 times when the cursor is at the last double quote mark. 3 chinese characters were deleted as expected, however, there are some extra curly brackets being shown in the end. If you hit enter, the statement would just work fine so it's just cosmetic.

Comment by Githook User [ 30/Jul/14 ]

Author:

{u'username': u'Bloodevil', u'name': u'yeaji.shin', u'email': u'bloodevil4@gmail.com'}

Message: SERVER-6086 fixed shell to handle zero-width and double-width characters

recompute character width when refresh line.
use existing function mk_wcswidth.
add fall back when mk_wcswidth return -1 on calculateColumnPosition.
before set position, calculate width of characters.

Closes #717

Signed-off-by: Benety Goh <benety@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/f58fbec3e4418adc6a266f41ad2872451fab760a

Generated at Thu Feb 08 03:10:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.