[SERVER-59067] Fix TLS Race condition Created: 03/Aug/21  Updated: 29/Oct/23  Resolved: 10/Aug/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.1.0, 5.0.0-rc8
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Sharding 2021-08-09, Sharding 2021-08-23
Participants:
Linked BF Score: 145

 Description   

The race is when initializing SSLManagerOpenSSL::_rolesNid, which creates a race. The problem is that this field is not static and OBJ_create() is invoked every time SSLManagerOpenSSL is invoked. The problem was amplified by the tenant migration code, which is using transient SSL state and is instantiating SSLManagerOpenSSL on every migration. The bug was always there but the new logic amplified it.

The race is present in all versions but is more serious in RHEL 7.0 using old OpenSSL. The difference is that in that old version the OBJ_cleanup() is actually erasing previous record if another OID with same keys is created. This make non properly guarded init to race with cleanup. In new version of OpenSSL OBJ_cleanup() is no-op.

Form the doc:

In OpenSSL versions prior to 1.1.0 OBJ_cleanup() cleaned up OpenSSLs internal object table and was called before an application exits if any new objects were added using OBJ_create(). This function is deprecated in version 1.1.0 and now does nothing if called. No explicit de-initialisation is now required. See OPENSSL_init_crypto(3) for further information.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 20/Sep/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-59067: BACKPORT-10237 to fix race condition in TLS to 5.0
Branch: v5.0
https://github.com/mongodb/mongo/commit/3db2eccd331b3c382fea07f6c9c01edcbe8e8736

Comment by Githook User [ 10/Aug/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-59067: fix TLS memory corruption
Branch: master
https://github.com/mongodb/mongo/commit/bf244ea14aa8b5c178d09a33e505520f45ce7b29

Comment by Andrew Shuvalov (Inactive) [ 10/Aug/21 ]

Essentially the bug was there forever. The only difference is that SSLManagerOpenSSL used to be a singleton while the new tenant migration project made it to be initialized each time a migration starts. The OpenSSL library is not internally synchronized and we used to change its internal object ID table without any protection. The fix is moving the OBJ_create() initializer from class field to static initializer.

Comment by Githook User [ 06/Aug/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-59067: low level stress test for OpenSSL manager
Branch: master
https://github.com/mongodb/mongo/commit/a3124693d95c37633cca3935d3d2515f63222d22

Generated at Thu Feb 08 05:46:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.