ISSUE SUMMARY
In some distributions, versions of OpenSSL, including the current OpenSSL packages for Red Hat Enterprise Linux and CentOS, use a faulty version of the AES-GCM cipher, that causes a crash after transmitting 64GB of data.
USER IMPACT
The instability created by issue makes it difficult to complete initial sync for replica set members and will cause frequent crashes.
WORKAROUNDS
- Upgrade to an upstream release of OpenSSL, avoiding the version provided by the package manager.
- Downgrade to an older version of OpenSSL.
- Use the new flag net.ssl.sslCipherConfig in MongoDB to configure the the OpenSSL cipher suites directly and disable AES-GCM. NOTE: configuring the cipher suites for OpenSSL should be done with care to avoid enabling unsafe ciphers.
AFFECTED VERSIONS
MongoDB 2.6 and 3.0 that use an affected OpenSSL library.
FIX VERSION
The fix is included in the 2.6.9 and 3.0.2 production releases.
RESOLUTION DETAILS
Implemented a configuration option to prevent mongod from using the problematic cipher, to provide an additional option for users who cannot change their OpenSSL version.
Original description
we're using mongodb (tried both 2.6.4 & 2.6.5) compiled with ssl on our own, and we're having problems recovering servers when they have to do a complete (initial) sync
2014-11-11T12:34:38.769+0100 [rsSync] ERROR: SSL: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac 2014-11-11T12:34:38.782+0100 [rsSync] SocketException: remote: 148.251.189.232:27018 error: 9001 socket exception [CONNECT_ERROR] 2014-11-11T12:34:38.818+0100 [rsSync] ERROR: SSL: error:140D00CF:SSL routines:SSL_write:protocol is shutdown 2014-11-11T12:34:38.818+0100 [rsSync] caught exception (socket exception [CONNECT_ERROR] for ) in destructor (~PiggyBackData) 2014-11-11T12:34:38.819+0100 [rsSync] replSet initial sync exception: 16465 recv failed while exhausting cursor 9 attempts remaining
after it fails multiple times, it finally gives up the initial sync and shuts down
what's interesting is that it always fails at the approximately same position (number of cloned objects), which is around 290K..
we could reproduce the same behavior on another server in a different shard of the same cluster
we're using openssl 1.0.1e 30.el6_6.4 on our mongod instances and the same version was used while compiling mongodb
as far as i understand, mongodb is using a dynamically linked library, so it actually shouldn't matter with which version it was compiled
we did find a bug in openssl which would output the same error message, but this bug was fixed in June 2014, and the version of openssl we're using already has the fix applied
did anyone else have this problem occur to them? is it an openssl bug, or mongodb bug?
i've attached a log file of one of the affected servers, with debug mode enabled