Technical Details:
Replica Set Size: 7 members
TLS Configuration:
mongod --tlsMode allowTLS --tlsAllowInvalidHostnames --tlsCertificateKeyFile /etc/mongo/certs/server.pem --tlsCAFile /etc/mongo/certs/ca.pem --setParameter enableFlowControl=false --ipv6 --bind_ip_all --port 27020 --dbpath=/data/db/wt-27020 --replSet rs-shard-1 --quiet --slowms 500 --logpath /data/db/mongo-27020.log --setParameter minSnapshotHistoryWindowInSeconds=1 --setParameter diagnosticDataCollectionEnabled=false --setParameter oplogFetcherUsesExhaust=false --setParameter allowMultipleArbiters=true --setParameter storageEngineConcurrentWriteTransactions=8 --setParameter storageEngineConcurrentReadTransactions=8 --logappend --oplogSize 100 --logRotate reopen --wiredTigerCacheSizeGB .26
Observed Behavior:
- Suddenly Slow queries are started after that we see high CPU spike randomly.
- Replication connections (inter-node) over TLS show higher latency during handshake and connection establishment compared to non-TLS (plain TCP).
- This is consistent across all replica set members.
- The issue persists with the provided TLS configuration and valid certificates.
- High CPU usage: We observe increased CPU utilization on mongod processes when TLS is enabled, especially during connection establishment and replication.
Impact:
- Increased time for replica set members to connect and sync.
- Overall replication lag is higher when TLS is enabled.
- Performance issues observed: Due to the increased latency and CPU usage, we are experiencing slower replication, delayed data consistency, and degraded application performance.
Steps to Reproduce:
- Deploy a 7-member replica set with the above TLS configuration.
- Measure connection establishment time and CPU usage between members with and without TLS.
- Compare replication lag, initial sync times, and CPU metrics.
Request:
- Investigate the cause of increased TLS connection times and high CPU usage in replica set communication.
- Provide recommendations or fixes to optimize TLS handshake, connection performance, and CPU utilization in MongoDB.
Logs
{"t":\{"$date":"2025-08-11T01:41:36.293+00:00"},"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}}
{"t":\{"$date":"2025-08-11T01:41:36.404+00:00"},"s":"W", "c":"ACCESS", "id":5626700, "ctx":"conn497","msg":"Client has attempted to reauthenticate as a single user","attr":{"user":
{"user":"__system","db":"local"}}}
{"t":\{"$date":"2025-08-11T01:41:36.404+00:00"},"s":"I", "c":"ACCESS", "id":5286306, "ctx":"conn497","msg":"Successfully authenticated","attr":{"client":"[fd00:3984:3989::1]:35738","isSpeculative":false,"isClusterMember":true,"mechanism":"SCRAM-SHA-256","user":"__system","db":"local","result":0,"metrics":{"conversation_duration":{"micros":110971,"summary":{"0":
{"step":1,"step_total":2,"duration_micros":80},"1":{"step":2,"step_total":2,"duration_micros":23}}}},"extraInfo":{}}}
{"t":\{"$date":"2025-08-11T01:46:36.295+00:00"},"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}}
{"t":\{"$date":"2025-08-11T01:46:36.406+00:00"},"s":"W", "c":"ACCESS", "id":5626700, "ctx":"conn497","msg":"Client has attempted to reauthenticate as a single user","attr":{"user":
{"user":"__system","db":"local"}}}
{"t":\{"$date":"2025-08-11T01:46:36.407+00:00"},"s":"I", "c":"ACCESS", "id":5286306, "ctx":"conn497","msg":"Successfully authenticated","attr":{"client":"[fd00:3984:3989::1]:35738","isSpeculative":false,"isClusterMember":true,"mechanism":"SCRAM-SHA-256","user":"__system","db":"local","result":0,"metrics":{"conversation_duration":{"micros":112016,"summary":{"0":
{"step":1,"step_total":2,"duration_micros":87},"1":{"step":2,"step_total":2,"duration_micros":40}}}},"extraInfo":{}}}
{"t":\{"$date":"2025-08-11T01:51:36.294+00:00"},"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}}
TLS connection slow logs and seen "authTimeMillis":334
{"t":\{"$date":"2025-08-08T07:24:20.594+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae........244:61]:27045","dnsResolutionTimeMillis":1,"tcpConnectionTimeMillis":111,"tlsHandshakeTime":null,"authTimeMillis":334,"hookTime":null,"totalTimeMillis":446}}
{"t":\{"$date":"2025-08-08T07:24:21.333+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae......244:ac]:27045","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":111,"tlsHandshakeTime":null,"authTimeMillis":332,"hookTime":null,"totalTimeMillis":443}}
{"t":\{"$date":"2025-08-08T07:24:21.401+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae00:3001:8311:172:16:244:bc]:27045","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":110,"tlsHandshakeTime":null,"authTimeMillis":334,"hookTime":null,"totalTimeMillis":444}}
TCP connection slow logs and seen "authTimeMillis":123
{"t":\{"$date":"2025-02-17T16:16:06.688+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae00:3.......44:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":123,"hookTime":null,"totalTimeMillis":163}}
{"t":\{"$date":"2025-02-17T16:19:27.967+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:a....:244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":38,"tlsHandshakeTime":null,"authTimeMillis":126,"hookTime":null,"totalTimeMillis":164}}
{"t":\{"$date":"2025-02-17T16:21:41.482+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:.........244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":123,"hookTime":null,"totalTimeMillis":163}}
{"t":\{"$date":"2025-02-17T16:24:57.232+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606.........44:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":125,"hookTime":null,"totalTimeMillis":165}}
{"t":\{"$date":"2025-02-17T16:26:34.810+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606.........16:244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":121,"hookTime":null,"totalTimeMillis":161}}
{"t":\{"$date":"2025-02-17T16:26:38.594+00:00"},"s":"I", "c":"ASIO", "id":6496400, "ctx":"ReplNetwork","msg":"Slow connection establishment","attr":{"hostAndPort":"[260..........4:a8]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":38,"tlsHandshakeTime":null,"authTimeMillis":124,"hookTime":null,"totalTimeMillis":162}}
Cluster Member Authentication (2-step SCRAM-SHA-256):
temp.txt (__system user):
Average: ~111,000 microseconds (111ms)
Step breakdown: Step 1 (<del>60-270μs), Step 2 (</del>18-31μs)
temp-tcp.txt (__system user):
Average: ~41,000 microseconds (41ms)
Step breakdown: Step 1 (<del>92-883μs), Step 2 (</del>23-32μs)