Combining characters in Unicode are composed of two adjacent Unicode code units but occupy a single position on the screen. The second code unit of these pairs needs to be treated as a zero-width character so that cursor positioning matches displayed text properly.
Chinese, Japanese and Korean (CJK) characters will typically occupy two screen positions although they are encoded as a single Unicode code unit. These characters need to be treated as double-width characters so that cursor position matches displayed text properly.
To reproduce:
Open mongo shell.
Insert multi-byte sequence as a find document such as:
> db.item.find({'type':"한글블라블라블라"})
Move the cursor to a few characters before the terminating quotation and hit backspace.
The terminating quotation will disappear.
- is depended on by
-
SERVER-14931 linenoise incorrectly duplicates lines containing multi-byte characters, when text wraps
- Closed
- is duplicated by
-
SERVER-14399 linenoise handles backspace incorrectly with Korean character set
- Closed
- related to
-
SERVER-14741 linenoise does not erase characters at end of line when backspacing over multi-byte characters in Windows
- Backlog
-
SERVER-2939 Support Unicode fully in the Mongo shell (was "Linenoise UTF8 support")
- Closed
- links to