[SERVER-62035] Investigate large delays in `CertGetCertificateChain` on Windows Created: 13/Dec/21  Updated: 05/Dec/22

Status: Backlog
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Backlog - Security Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-54900 Blocking networking calls can delay s... Closed
Assigned Teams:
Server Security
Operating System: ALL
Sprint: Security 2022-02-07
Participants:

 Description   

SERVER-54900 introduced a new timer to enable TransportLayerASIO handle timeouts during SSL handshaking. The timer causes some of the external_auth tests to fail on Windows (see this patch for example). Those tests consistently fail on Windows, and pass on all other platforms. The cause of failure is very long delays for the mongo shell to connect to a sharded cluster:

[js_test:ldap_authz_authn] sh11980| MongoDB shell version v5.2.0-alpha-683-g49abfaf-patch-619c04589ccd4e13926f09c2
[js_test:ldap_authz_authn] sh11980| connecting to: mongodb://EC2AMAZ-N4BVR59:20547/test?authMechanism=MONGODB-X509&authSource=%24external&compressors=disabled&gssapiServiceName=mongodb
71 lines skipped.
[js_test:ldap_authz_authn] sh11980| 2021-11-22T23:42:07.355Z W  NETWORK  4780400 [js] "OCSP responder was slow to respond","attr":{"durationMillis":6285}
[js_test:ldap_authz_authn] sh11980| 2021-11-22T23:42:07.356Z W  NETWORK  23273   [js] "You have an IP Address in the DNS Name field on your certificate. This formulation is depreceated."
[js_test:ldap_authz_authn] sh11980| 2021-11-22T23:42:07.356Z W  NETWORK  23276   [js] "The server certificate does not match the host name","attr":{"remoteHost":"EC2AMAZ-N4BVR59","certificateNames":"localhost 127.0.0.1 , Subject Name: CN=server,OU=Kernel,O=MongoDB,L=New York City,ST=New York,C=US"}
[js_test:ldap_authz_authn] sh11980| Error: couldn't connect to server EC2AMAZ-N4BVR59:20547, connection attempt failed: NetworkTimeout: SSL handshake timed out after 7964 ms, started on 2021-11-22T23:41:59.392+00:00, completed on 2021-11-22T23:42:07.356+00:00, and was configured to time out after 5000 ms :
[js_test:ldap_authz_authn] sh11980| connect@src/mongo/shell/mongo.js:384:17
[js_test:ldap_authz_authn] sh11980| @(connect):3:6
43 lines skipped.
[js_test:ldap_authz_authn] sh11980| exception: connect failed
[js_test:ldap_authz_authn] sh11980| exiting with code 1

These large delays happen as we run the following Windows-specific code:
https://github.com/mongodb/mongo/blob/0b5f8fbf748fa7c8da75bd64ac9ca4ed322de321/src/mongo/util/net/ssl_manager_windows.cpp#L1753-L1767

That being said, this new timer feature is currently disabled on Windows. This ticket should investigate why creating a timer thread causes such long delays in running CertGetCertificateChain on Windows. After addressing the issue, we need a separate ticket to enable SERVER-54900 on Windows.



 Comments   
Comment by Spencer Jackson [ 13/Dec/21 ]

We are spending 11 seconds to run CertGetCertificateChain, which is longer than the 5 seconds configured timeout. Is this long delay expected?

Probably not, no. Do you have any objection to assigning this ticket to backlog-server-security?

Generated at Thu Feb 08 05:53:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.