-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 7.0.12
-
Component/s: None
-
None
-
Networking & Observability
-
ALL
-
Security 2024-08-05, Security 2024-08-19, Networking & Obs 2024-09-02, Networking & Obs 2024-09-16, Networking & Obs 2024-09-30, Networking & Obs 2024-10-14, Networking & Obs 2024-10-28, Networking & Obs 2024-11-11, Networking & Obs 2024-11-25, Networking & Obs 2024-12-09, Networking & Obs 2024-12-23, Networking & Obs 2025-01-06, Networking & Obs 2025-01-20, Networking & Obs 2025-02-03
-
None
-
None
-
None
-
None
-
None
-
None
-
None
OS: CentOS Stream release 9 (Linux xxx-test-db-2.azr.etn 5.14.0-472.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 27 20:15:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux)
Mongo version: 7.0.12 (the same behaviour for build from your yum repository - mongodb-org-server-7.0.12-1.el9.x86_64 and our custom build with debug info)
HW: Azure VM Standard F16s v2 (16 vcpus, 32 GiB memory)
Clients:
- Mongodb-exporter-0.39.0-0.el9.x86_64
- mongo-java-client-4.6.1
Current configuration:
systemLog: destination: file logAppend: true logRotate: reopen path: /var/log/mongodb/mongod.log storage: dbPath: /var/lib/mongo engine: wiredTiger directoryPerDB: false processManagement: fork: false pidFilePath: /var/run/mongodb/mongod.pid timeZoneInfo: /usr/share/zoneinfo # network interfaces net: bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces. tls: mode: requireTLS certificateKeyFile: /etc/ssl/xxx-test-db-2.azr.etn.pem CAFile: /etc/ssl/xxx-test-db-2.azr.etn.CA.pem allowConnectionsWithoutCertificates: true allowInvalidCertificates: true logVersions: TLS1_0,TLS1_1,TLS1_2,TLS1_3 ipv6: false maxIncomingConnections: 500 port: 27017 replication: oplogSizeMB: 1024 replSetName: repl-xxx-test security: authorization: enabled keyFile: /etc/mongo.key
We have identified a bug during reading a TLS stream. Mongo is trying to read from malfunctioning TLS stream (error SSL_ERROR_SYSCALL) and then the connection thread gets into infinite loop. The bug happens when Mongo runs either in replica set (it can happen on both primary and secondary nodes) or as a single instance.
The system shows increased load, but there is no significant IO activity. The load is generated by connection threads. Those threads are not present in db.currentOp() status.
Stacktrace (pmp_strace.log) shows that these threads are mainly present in ssl handling parts of the code (ie. functions like ERR_clear_error, SSL_read) or memory free function (tc_free - called from ERR_clear_error function).
Our investigation starts in function engine:perform. We identified that ssl error returns status 0x5 (SSL_ERROR_SYSCALL, as described here https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html it means "Some non-recoverable, fatal I/O error occurred.").
The asio::error_code we get varies in different connection threads, examples:
After match condition ssl_error == SSL_ERROR_SYSCALL the function returns 0 (want_nothing)
Next interesting part is in asio::detail::read_buffer_sequence function. Mongo detects that the buffer is not empty.
And then goes to read_some function.
This function returns us to engine::perform again.
Buffer contains following data