-
Type: Improvement
-
Resolution: Works as Designed
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
If a document is added to a collection with a field whose value contains an invalid UTF-8 string, creation of a text index on that field fails with the error "text contains invalid UTF-8". Creation of a normal index succeeds.
(In the scenario I tested, the invalid UTF-8 is one where only the high surrogate of a surrogate pair is included in the string.)
Given that the server allows invalid UTF-8 strings to be inserted into the database, we should consider whether the server should be more resilient to the presence of invalid UTF-8 strings when creating text indices.
It would also be reasonable to close this as Works as Designed, as I imagine it's fairly rare for this to happen in practice.
- related to
-
SERVER-62871 [4.4] Improve handling of text index creation in the presence of invalid UTF-8
- Closed
-
JAVA-4431 Driver allows inserting invalid UTF-8 strings
- Closed