QE - Case and diacritic sensitivity not honoured for explicit encryption

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Critical - P2
    • None
    • Component/s: Client Side Encryption
    • None
    • Needed
    • Hide

      Summary of necessary driver changes

      •  

      Commits for syncing spec/prose tests
      (and/or refer to an existing language POC if needed)

      •  

      Context for other referenced/linked tickets

      •  
      Show
      Summary of necessary driver changes   Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed)   Context for other referenced/linked tickets  
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-6327 Blocked
      CXX-3489 Blocked
      CSHARP-6030 Blocked
      GODRIVER-3904 Blocked
      JAVA-6196 Blocked
      NODE-7578 Blocked
      PYTHON-5820 Blocked
      PHPLIB-1846 Blocked
      RUBY-3875 Blocked
      RUST-2421 Blocked
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-6327 Blocked CXX-3489 Blocked CSHARP-6030 Blocked GODRIVER-3904 Blocked JAVA-6196 Blocked NODE-7578 Blocked PYTHON-5820 Blocked PHPLIB-1846 Blocked RUBY-3875 Blocked RUST-2421 Blocked

      Summary

      When performing explicit encryption of a text-indexed field with Queryable Encryption via ClientEncryption.encrypt the case, and possibly diacritic, sensitivity settings are not honoured. This appears to be an issue in libmongocrypt and not the drivers as such.

      As an example, I have a DEK that is "b7csKuW8B1zoGeA+JLg3puwpBiMMig/Pk/k707SgFmNa5pQmW5pHT8JKKShQ8Myl7jZ5Hzy2l3oCqqSUgmUDRCxcp2/j7Y7GT/F55dTEjeu5tf4WCZuBZ5qBcBQ7FW1X" and I perform the following:

          prefix_text_opts = TextOpts(
              prefix={
                  "strMinQueryLength": 2,
                  "strMaxQueryLength": 6,
              },
              case_sensitive=False,
              diacritic_sensitive=False,
          )
          ciphertext_firstname = client_encryption.encrypt(
              "Sarah",
              algorithm=Algorithm.TEXTPREVIEW,
              key_id=firstname_dek["_id"],
              contention_factor=0,
              text_opts=prefix_text_opts
          )
      

      My b.e.s field in the FLE2InsertUpdatePayloadV2 should be a5b6c1ffb119c01f194ead53674ee15d0df0862212c2800a6c4debecbc7543a0, but instead it is 146704c5b43a7cabfd312b62512b55993ddbb0209b55d6a08d8daa4524ddbff1. If I use the string sarah instead then the b.e.s field is correct, proving that the case_sensitive=False is not honoured. Similarly, the b.e.d field should be 56949428f63f5647ccd0821705ddceeee6dc5456c0670e76cfdc5001dece3221 but it is b5b0a72993be17500ccd1a39b46722828f3524b275627172baba25f712919207 unless I change the case sensitivity manually.

      I believe the same issue occurs for the prefix tokens (b.p.s and b.p.d) and I assume this also occurs for the other text-indexed types.

      Motivation

      Who is the affected end user?

      Any developer using explicit encryption with text-indexed fields will be affected by this issue and during queries documents will not be returned that should be returned. If a query uses explicit encryption it will only return the manually encrypted documents where the search term matches the case sensitivity settings pre-encryption, the same goes for auto encryption as well.

      How does this affect the end user?

      The end user will not receive all the documents they expect if performing a $match with $encStrNormalizedEq or the other QE type text searches. This is critical for end-user confidence that this works.

      How likely is it that this problem or use case will occur?

      Anyone using explicit encryption with text-indexed fields

      If the problem does occur, what are the consequences and how severe are they?

      Incorrect documents will be returned or there will be missing documents from the query.

      Is this issue urgent?

      I recommend this be fixed before GA

      Is this ticket required by a downstream team?

      Unknown

      Is this ticket only for tests?

      I recommend creating new tests to ensure that the token creation is correct

      Acceptance Criteria

      Explicit encryption must adhere to the settings in the TextOpts and ideally to the Encrypted Fields Map that the end collection has set.

      I have code in Python and Go that demonstrates this issue that I can provide if required.

        1. just_why_creator.py
          4 kB
          Brett Gray
        2. just_why.py
          7 kB
          Brett Gray

            Assignee:
            Adrian Dole
            Reporter:
            Brett Gray
            Kevin Albertson Kevin Albertson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: