[JAVA-4046] GridFS & Client-Side Field Level Encryption Created: 15/Mar/21  Updated: 27/Oct/23  Resolved: 19/Apr/21

Status: Closed
Project: Java Driver
Component/s: Client Side Encryption, GridFS
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Laurent Charlois Assignee: Jeffrey Yemin
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Goal

Be able to use explicit (manual) encryption for files uploaded through the GridFS driver.

Why

If i'm not wrong, the only way to encrypt/decrypt files with GridFS is to use  the automatic CSFLE. It requires to create a new client each times and to install the mongo-enterprisecryptd on the host.

Due to the way we store data in MongoDB (multiple database), we don't want to have to create a new MongoClient each time.

How

By passing required information to encrypt/decrypt data to the GridFSUploadOptions and GridFSDownloadOptions.

 

 

 

If this request is accepted and you provide me basic insights on how you are expecting this feature to be implemented, I will be happy to work on it and submit a pull request on the Github repository.

 

 



 Comments   
Comment by Backlog - Core Eng Program Management Team [ 19/Apr/21 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Jeffrey Yemin [ 02/Apr/21 ]

Hi laurent.charlois@neotys.com

Thanks for clarifying your use case. Now that I understand better, I have a workaround for you to consider that I think will allow you to do what you need without waiting for any changes in the driver.

To encrypt a file:

  1. Read a chunk at a time from input stream
  2. Encrypt the chunk using explicit encryption
  3. Write the size of the encrypted chunk to the upload stream
  4. Write the encrypted chunk to the upload stream
  5. Goto 1 until EOF

Then to decrypt a file:

  1. Read size of encrypted chunk from download stream
  2. Read encrypted chunk from download stream
  3. Decrypt chunk using explicit decryption
  4. Write decrypted chunk to output stream
  5. Goto 1 until EOF

This is what it would look like in Java:

    public static void main(String[] args) throws IOException {
 
        String inputFileName = args[0];
        String outputFileName = args[1];
 
        // For testing purposes, use a local master key
        byte[] localMasterKey = new byte[96];
        new SecureRandom().nextBytes(localMasterKey);
 
        Map<String, Map<String, Object>> kmsProviders = new HashMap<>() {{
            put("local", new HashMap<>() {{
                put("key", localMasterKey);
            }});
        }};
 
        // Configure client settings here
        var settings = MongoClientSettings.builder()
                .build();
 
        try (var encryption = ClientEncryptions.create(
                ClientEncryptionSettings.builder()
                        .keyVaultMongoClientSettings(settings)
                        .keyVaultNamespace("keyvault.keyvault")
                        .kmsProviders(kmsProviders)
                        .build());
             var client = MongoClients.create(settings)) {
 
            // For testing purposes, make a clean slate
            client.getDatabase("keyvault").getCollection("keyvault").drop();
            MongoDatabase gridfsDatabase = client.getDatabase("gridfs");
            gridfsDatabase.drop();
 
            // For testing purposes, just create a brand new data key
            BsonBinary dataKeyId = encryption.createDataKey("local", new DataKeyOptions());
 
            var gridfs = GridFSBuckets.create(gridfsDatabase);
 
            // Encrypt the file, one chunk at a time
            try (var uploadStream = gridfs.openUploadStream(inputFileName);
                 var file = new FileInputStream(inputFileName)) {
 
                var buffer = new byte[gridfs.getChunkSizeBytes() - 82];
 
                // Read a chunk of the unencrypted file
                int bytesRead = file.read(buffer);  // TODO: ensure buffer is as full as possible
 
                while (bytesRead != -1) {
                    // Encrypt the chunk
                    var encrypted = encryption.encrypt(new BsonBinary(truncate(buffer, bytesRead)),
                            new EncryptOptions("AEAD_AES_256_CBC_HMAC_SHA_512-Random")
                                    .keyId(dataKeyId)
                    );
 
                    // Write the length of the encrypted chunk, then write the encrypted chunk itself
                    uploadStream.write(ByteBuffer.allocate(4).putInt(encrypted.getData().length).array());
                    uploadStream.write(encrypted.getData());
 
                    bytesRead = file.read(buffer);
                }
            }
 
            // Now decrypt the file, one chunk at a time
            try (var downloadStream = new DataInputStream(gridfs.openDownloadStream(inputFileName));
                 var file = new FileOutputStream(outputFileName)) {
 
                var chuckSizeBuffer = new byte[4];
 
                while (true) {
                    // Read the size of the next encrypted chunk
                    downloadStream.readFully(chuckSizeBuffer);
                    int chunkSize = ByteBuffer.wrap(chuckSizeBuffer).getInt();
                    // Read the next encrypted chunk
                    byte[] encryptedChunk = new byte[chunkSize];
                    downloadStream.readFully(encryptedChunk);
 
                    // Decrypt the chunk and write it to the output file
                    var decryptedValue = encryption.decrypt(new BsonBinary(encryptedChunk));
                    file.write(decryptedValue.asBinary().getData());
                }
            } catch (EOFException e) {
                // This means we've reached the end of the file. It's weird to use an exception as control flow,
                // but it's the price of using DataInputStream.readFully in this scenario
            }
        }
    }
 
    // Truncate the buffer to number of bytes actually read
    private static byte[] truncate(byte[] buffer, int bytesRead) {
        if (buffer.length == bytesRead) {
            return buffer;
        } else {
            return Arrays.copyOf(buffer, bytesRead);
        }
    }

Please have a look and let me know if this would work for you.

Comment by Laurent Charlois [ 02/Apr/21 ]

Hi Jeffrey,

 

Our application is multi-tenant.
We use one logical database per tenant (inside the same replicaset).

We want to use one master key per tenant, and we want to use a master key provided by the tenant.

Customer A, use database A.
Customer A, gives us a configuration for their KMS master key and then we use this customer specific configuration to encrypt its data.

 

My understanding of the feature, is that in automatic mode you need to specify the configuration to retrieve the master key from the KMS when you create the mongodb client.

In this way, if we want to be able to use automatic encryption mode, we need to use a different mongodb client for each customer.

 

Let me kow if it's still unclear.

Laurent.
 

Comment by Jeffrey Yemin [ 31/Mar/21 ]

Hi laurent.charlois@neotys.com

Due to the way we store data in MongoDB (multiple database), we don't want to have to create a new MongoClient each time.

Can you clarify what you mean by that statement? When you say multiple database, do you mean multiple logical databases in a single deployment (e.g. a single replica set or sharded cluster), or multiple deployments. If the former, I'm not clear why you'd need a new MongoClient each time, and if the latter, it's unavoidable.

Comment by Laurent Charlois [ 24/Mar/21 ]

Thanks for the answer.

We are looking to encrypt the file data (field `data` of chunk documents).

I was able to make it work using the CSFLE automatic mode, but my need (due to our design) is to be able to use manual mode instead.

Laurent.

Comment by Ross Lawley [ 24/Mar/21 ]

HI laurent.charlois@neotys.com,

Thanks for ticket and apologies for the slow response.

Our GridFS Implementation is based on the GridFS Specification. Any changes to our implementation would require changes to the specification first before being implemented in the drivers.

What information are you looking to be encrypted / decrypted ? The file data itself or the meta data?

Ross

Generated at Thu Feb 08 09:01:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.