Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62348

Text index creation fails with error "text contains invalid UTF-8"

    • Type: Icon: Improvement Improvement
    • Resolution: Works as Designed
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Execution

      If a document is added to a collection with a field whose value contains an invalid UTF-8 string, creation of a text index on that field fails with the error "text contains invalid UTF-8". Creation of a normal index succeeds.

      (In the scenario I tested, the invalid UTF-8 is one where only the high surrogate of a surrogate pair is included in the string.)

      Given that the server allows invalid UTF-8 strings to be inserted into the database, we should consider whether the server should be more resilient to the presence of invalid UTF-8 strings when creating text indices.

      It would also be reasonable to close this as Works as Designed, as I imagine it's fairly rare for this to happen in practice.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            jeff.yemin@mongodb.com Jeffrey Yemin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: