Convenient CSE API for explicit encryption

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Component/s: Client Side Encryption
    • None

      As part of implementing CSE in Ruby, I noticed that while the existing CSE API for explicit encryption supports a variety of use cases, it could be verbose for some of the use cases as well. In this ticket are some ideas to make the API more streamlined. I am not clear on whether there is a need to implement an alternative API - this depends on use cases that are part of the CSE story as well as actual user feedback.

      The current API for explicit encryption, as specified in the CSE spec, looks like the following. We assume the data key was already generated and its id is provided in an environment variable, and the user wants to encrypt and decrypt some data.

      # Client created elsewhere
      client = Mongo::Client.new(...)
      
      client_encryption_options = {
        kms_providers: {
          local: { key: "ruby" * 24 }
        },
        key_vault_namespace: 'keys.keys',
      }
      
      # There is no way to reuse an already created client encryption object
      # with the same options.
      client_encryption = Mongo::ClientEncryption.new(
        client,
        client_encryption_options,
      )
      
      # Data key id must be manually converted from string to BSON::Binary.
      # User must also specify :uuid subtype for BSON::Binary.
      data_key_id = BSON::Binary.new(ENV['DATA_KEY_ID'], :uuid)
      
      encrypted = client_encryption.encrypt(
        'Hello, world!',
        # Data key id must be provided with each encrypt call.
        key_id: data_key_id,
        # The algorithm is always the same - AEAD_AES_256_CBC_HMAC_SHA_512 -
        # but must be provided, as a string or a constant which typically has
        # the same characters, to each call. The actual tunable is whether
        # encryption is deterministic or probabilistic, which is a boolean.
        algorithm: 'AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic',
      )
      
      expect(encrypted).to be_a_kind_of(BSON::Binary)
      
      decrypted = client_encryption.decrypt(encrypted)
      expect(decrypted).to eq('Hello, world!')
      
      client_encryption.close
      client.close
      
      

      I annotated the code with comments, which are as follows:

      • Client encryption is explicitly constructed each time the user wants to perform en/decryption. If the user wants to cache client encryption object, this must be done manually.
      • Data key id must be provided as a BSON binary. Generally the application will retrieve it as a string from some storage medium such as an environment variable. The application must then manually convert the string to the BSON binary, remembering to specify the correct subtype for the binary object.
      • Each encrypt call requires specifying the data key id, even if the application only ever uses a single data key.
      • Each encrypt call requires specifying the algorithm. There is actually only one supported algorithm (AEAD_AES_256_CBC_HMAC_SHA_512) with two modes: deterministic and probabilistic, but the application will generally reference the entire "AEAD_AES_256_CBC_HMAC_SHA_512" string either as a string literal or a constant (which, for example in the Python driver, is the same exact string in different capitalization), when it really only specifies a boolean value to represent whether to perform encryption deterministically.

      Consider the following potential alternate API:

      # Client created elsewhere
      client = Mongo::Client.new
      
      # Create an encryption key
      client_encryption_options = {
        kms_providers: {
          local: { key: "ruby" * 24 }
        },
        key_vault_namespace: 'keys.keys',
      }
      
      client.crypt(client_encryption_options) do |client_encryption|
      
        # Data key is associated with the algorithm used.
        # Data key id is returned as a string.
        data_key_id = client_encryption.create_data_key(
          algorithm: 'AEAD_AES_256_CBC_HMAC_SHA_512',
        )
        ENV['DATA_KEY_ID'] = data_key_id
      
      end
      
      # All encryption options can be specified on the client encryption object.
      # Data key and deterministic flag can be overridden later.
      client_encryption_options = {
        kms_providers: {
          local: { key: "ruby" * 24 }
        },
        key_vault_namespace: 'keys.keys',
        # Defaults for data key id & encryption mode
        # Key is accepted as a string
        data_key_id: ENV['DATA_KEY_ID'],
        deterministic: false,
      }
      
      # Client encryption is created from the client and can be owned by the client,
      # making it easy to reuse client encryption objects.
      client.crypt(client_encryption_options) do |client_encryption|
      
        # Probabilistic
        encrypted = client_encryption.encrypt(
          'Hello, world!',
        )
      
        # Deterministic
        encrypted = client_encryption.encrypt(
          'Hello, world!',
          # Key & encryption mode can be overridden
          deterministic: true,
        )
      
        expect(encrypted).to be_a_kind_of(BSON::Binary)
      
        decrypted = client_encryption.decrypt(encrypted)
        expect(decrypted).to eq('Hello, world!')
      
      end
      
      client.close
      

      In this version, the following changes are made:

      • All encryption options are specified on client encryption object. The key id and mode can be overridden per operation later.
      • The client encryption object is created from the client, making it easy for the client to cache client encryption objects.
      • Because the client encryption is only created the first time it is needed, there is no explicit close call for it anymore.
      • Data key id is accepted as a string. The driver will convert it to BSON binary of the correct type as needed.
      • Since generally the length of encryption key is determined by the algorithm that the key is used with, the encryption algorithm is specified with the key. When encrypting, then, only the encryption mode needs to be specified (deterministic or probabilistic). It is possible that multiple algorithms will use the same key length; however, in the general case they will not and it seems to me that, based on scram-sha-1 and scram-sha-256 authentication mechanisms, it is likely that a new encryption algorithm will be added to allow a larger key length to be used.
      • Encrypting plain text with the default options (key & mode) requires only the plain text as the argument.
      • Data key id and encryption mode can still be specified per operation if desired.

      For the avoidance of doubt, the purpose of this ticket is only to point out the possible alternative API, not to suggest it must be presently implemented.

      cc emily.giurleo asya kenneth.white

              Assignee:
              Unassigned
              Reporter:
              Oleg Pudeyev (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: