-
Type:
Task
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Security
-
Product Performance
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Problem
Under encryption-at-rest, every WiredTiger page-in on a cache miss calls the AES decrypt path, which in turn constructs a fresh SymmetricDecryptorOpenSSL instance. The constructor calls EVP_CIPHER_CTX_new() and the destructor calls EVP_CIPHER_CTX_free(), so each decrypt pays one allocate/free pair purely for the OpenSSL cipher-context object — no cryptographic work. On a 100% read out-of-cache workload with 22K+ block reads/sec per node, this costs ~1.06% cum CPU on the decrypt critical path (SymmetricDecryptor::create 0.84% + ~SymmetricDecryptorOpenSSL 0.22%), dominated by CRYPTO_zalloc of a ~200-byte CTX struct that is immediately discarded.
Solution
Move the EVP_CIPHER_CTX ownership out of the per-call SymmetricDecryptorOpenSSL instance and into a thread_local std::unique_ptr<EVP_CIPHER_CTX, ...> inside the class. _ctx becomes a borrowed raw pointer. Each constructor call invokes EVP_CIPHER_CTX_reset(_ctx) before initCipherContext(...), which is OpenSSL's documented idiom for reusing a CTX across independent messages — it discards key/IV/GCM state while keeping the allocation. The thread-local's unique_ptr destructor runs at thread exit and calls EVP_CIPHER_CTX_free exactly once per thread over the thread's lifetime. Scope is confined to SymmetricDecryptorOpenSSL; no public-API change, and the encryption path is deliberately left untouched (writes are cold on this workload). The "at most one decryptor live per thread at any instant" invariant is load-bearing and verified against every SymmetricDecryptor::create caller — all are synchronous create/use/destroy patterns with no suspension points.