Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62348

Text index creation fails with error "text contains invalid UTF-8"

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Minor - P4
    • Resolution: Works as Designed
    • None
    • None
    • None
    • None

    Description

      If a document is added to a collection with a field whose value contains an invalid UTF-8 string, creation of a text index on that field fails with the error "text contains invalid UTF-8". Creation of a normal index succeeds.

      (In the scenario I tested, the invalid UTF-8 is one where only the high surrogate of a surrogate pair is included in the string.)

      Given that the server allows invalid UTF-8 strings to be inserted into the database, we should consider whether the server should be more resilient to the presence of invalid UTF-8 strings when creating text indices.

      It would also be reasonable to close this as Works as Designed, as I imagine it's fairly rare for this to happen in practice.

      Attachments

        Issue Links

          Activity

            People

              backlog-query-execution Backlog - Query Execution
              jeff.yemin@mongodb.com Jeffrey Yemin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: