[JAVA-5297] ClientEncryption encrypt/decrypt KMS key caching Created: 22/Jan/24  Updated: 01/Feb/24

Status: Backlog
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Ivan Zaitsev Assignee: Jeffrey Yemin
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DRIVERS-2781 Add option to configure DEK cache lif... Backlog

 Comments   
Comment by Kevin Albertson [ 30/Jan/24 ]

Thank you for the feedback orxorandnot@gmail.com. This ticket is in DRIVERS-2781 as further motivation. I recommend watching DRIVERS-2781 for updates.

Comment by Ivan Zaitsev [ 30/Jan/24 ]

Hello kevin.albertson@mongodb.com 

  • How often are the errors observed?

Multiple times within a one week period. I noticed that other people also experienced same problem within same period, here.

  • Is the expected rate of requests to Azure known?

Don't think that it depends on how many instances access the Azure. I think that it is more likely transient Azure error which can happen even if instance is idle or has low rate of requests.

  • Is the expected latency to the Azure key vault known?

Usually it responds really fast, something close to yours results. I think that the problem is that even if Azure Key Vault would have good SLA but sometimes it would have this kind of transient errors it would affect application instances anyways.

 

I think that it would really be good to have DEK cashing time configurable.

 

Comment by Kevin Albertson [ 30/Jan/24 ]

Hi orxorandnot@gmail.com, thank you for the report.

There have been other reports of observed transient KMS errors. DRIVERS-2781 proposes adding an option to configure the DEK cache lifetime. I suggest watching DRIVERS-2781 for updates. I expect a change would be needed within libmongocrypt to modify the caching behavior.

If possible, more information about the conditions leading to the error may help to reproduce and test a solution:

  • How often are the errors observed?
  • Is the expected rate of requests to Azure known? E.g. are there many instances of the application making concurrent requests to the Azure key vault?
  • Is the expected latency to the Azure key vault known? E.g. are requests typically taking several seconds? I ran this test to repeatedly use a ClientEncryption to decrypt a DEK with an Azure with an RSA 2048 key in the US East region (closest to me). Here are my results:

Total requests run : 1000
Duration : 386.25s
Avg requests/sec : 2.59
Max request time : 1.89s
Median request time : 0.35s
Histogram
[0.00-0.19s) : 0 (0.00%)
[0.19-0.38s) : 702 (70.20%)
[0.38-0.57s) : 221 (22.10%)
[0.57-0.76s) : 63 (6.30%)
[0.76-0.94s) : 9 (0.90%)
[0.94-1.13s) : 3 (0.30%)
[1.13-1.32s) : 0 (0.00%)
[1.32-1.51s) : 1 (0.10%)
[1.51-1.70s) : 0 (0.00%)
[1.70-1.89s] : 1 (0.10%)

 

Comment by Ivan Zaitsev [ 22/Jan/24 ]

Hello jeff.yemin@mongodb.com thanks for quick response. I think for my use case I would extend ClientEncryptionImpl and Crypt class to implement additional caching. Started noticing that sometimes Azure Key Vault responds with 'The service is unavailable.' and 'Read time out' errors. Retries add more latency as crypt request already has 10 seconds timeout option.

Comment by Jeffrey Yemin [ 22/Jan/24 ]

Hi orxorandnot@gmail.com the driver does actually cache data keys, but the code for that is in the C library that the driver wraps, which is why it's not possible to spot looking at the Java code.

Data keys are cached for one minute (which is not a configurable value).

Comment by Ivan Zaitsev [ 22/Jan/24 ]

https://github.com/mongodb/mongo-java-driver/blob/d80e9c1de594113001a47ce0e8f9db5baca37249/driver-sync/src/main/com/mongodb/client/internal/ClientEncryptionImpl.java#L112

 

https://github.com/mongodb/mongo-java-driver/blob/d80e9c1de594113001a47ce0e8f9db5baca37249/driver-sync/src/main/com/mongodb/client/internal/Crypt.java#L357

Comment by Ivan Zaitsev [ 22/Jan/24 ]

Current implementation each time calls mongo for data encryption key and then KMS for customer master key.

These operations, especially http call to KMS adds a lot of latency.

These keys do not change frequently, i think it would be good to add caching to improve performance.

Comment by PM Bot [ 22/Jan/24 ]

Hi orxorandnot@gmail.com, thank you for reporting this issue! The team will look into it and get back to you soon.

Generated at Thu Feb 08 09:04:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.