[SERVER-11807] Idle SSL replset has SSL errors and socket exceptions that aren't present otherwise Created: 21/Nov/13  Updated: 16/Nov/21  Resolved: 30/Apr/14

Status: Closed
Project: Core Server
Component/s: Security
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kevin Pulo
Resolution: Won't Fix Votes: 0
Labels: logging, ssl
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-11806 Distinct SSL messages for distinct ca... Closed
depends on SERVER-8864 Allow mixed SSL and non-SSL connections Closed
depends on SERVER-10968 Improved SSL error handling Closed
Related
Operating System: ALL
Participants:

 Description   

Running a simple 3 node replset over localhost, the cluster idles and outputs the normal log messages for heartbeat connections being created and shutdown 30 seconds later.

Restarting the cluster with SSL enabled, the same result would be expected. Instead, see a variety of SSL errors and socket exceptions. Cluster otherwise functions normally, in that heartbeat connections are successfully re-established, and normal replset operations proceed without issue.

The socket exceptions are mostly CONNECT_ERROR, CLOSED and sometimes RECV_ERROR or SEND_ERROR.

The SSL errors are usually

  • SSL23_GET_SERVER_HELLO
  • could not negotiate SSL connection: EOF detected

and sometimes

  • SSL Error ret when receiving: -1 err: 2 error:00000000:lib(0):func(0):reason(0)
  • SSL Error ret when receiving: -1 err: 5 error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac

For testing, starting a large replset (eg. 12 nodes) means there are a lot of connections, and so errors are easily observed in under a minute of idling. Smaller replsets still hit the issue, but take proportionately longer.

This is fixed by SERVER-8864 (specifically commit 9ca2fb0), SERVER-10968, and SERVER-11806.



 Comments   
Comment by Eric Milkie [ 30/Apr/14 ]

The issue has been fixed in 2.6; backporting to 2.4 is not possible at this time.

Comment by Eric Milkie [ 16/Jan/14 ]

The fixes in the related SERVER tickets went into version 2.5.3 and newer; the first production version with the fixes will be 2.6.

Comment by Zane Williamson [ 16/Jan/14 ]

I see these SSL errors randomly in our replica set as well. We are running 2.4.9. It appears this is still an outstanding bug/issue?

Generated at Thu Feb 08 03:26:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.