[SERVER-8833] Segmentation fault after replSet relinquishing primary state, network/SSL related Created: 03/Mar/13  Updated: 08/Mar/13  Resolved: 04/Mar/13

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Dave Claussen Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: connection, segfault, ssl
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:
  1. uname -a
    Linux (hostname) 2.6.18-194.3.1.el5.028stab069.6 #1 SMP Tue Aug 10 21:28:51 GMT 2010 x86_64 x86_64 x86_64 GNU/Linux
  1. cat /etc/release
    CentOS release 5.8 (Final)

Mongo is using SSL, in a clustered environment, with one primary, one secondary, and one arbiter.


Issue Links:
Duplicate
duplicates SERVER-5487 Seg fault when shutting down replica ... Closed
Operating System: Linux
Steps To Reproduce:

1. Set up a SSL mongo instnace with primary, secondary, and arbiter with host names (maybe in /etc/hosts).
2. Run the server
3. Eliminate the host names
4. Probably need to be doing some requests against the server too.

Note: I'm obviously guessing that the hostname issue is the cause, perhaps it's something completely different. But normally we run without issues for days on end without this happening.

Participants:

 Description   

We use Cloudflare for our DNS. Their DNS went down this morning at 9:47 UTC as described here (http://techcrunch.com/2013/03/03/cloudflare-is-down-due-to-dns-outage-taking-down-785000-websites-including-4chan-wikileaks-metallica-com/).

That caused our primary instance of MongoDB relinquishing primary state because we currently use hostnames in our replica set [I'll change this in the future so it does not].

That I would expect. But it also seg faulted: (I've changed the host names to be more descriptive)

Sun Mar 3 03:52:12 [rsHealthPoll] getaddrinfo("secondary.ourhost.com") failed: Name or service not known
Sun Mar 3 03:52:18 [rsHealthPoll] getaddrinfo("arbiter.ourhost.com") failed: Name or service not known
Sun Mar 3 03:52:22 [rsHealthPoll] getaddrinfo("secondary.ourhost.com") failed: Name or service not known
Sun Mar 3 03:52:28 [rsHealthPoll] getaddrinfo("arbiter.ourhost.com") failed: Name or service not known
Sun Mar 3 03:52:28 [rsMgr] can't see a majority of the set, relinquishing primary
Sun Mar 3 03:52:28 [rsMgr] replSet relinquishing primary state
Sun Mar 3 03:52:28 [rsMgr] replSet SECONDARY
Sun Mar 3 03:52:28 [rsMgr] replSet closing client sockets after relinquishing primary
Sun Mar 3 03:52:31 Invalid access at address: 0xf8 from thread: conn249138

Sun Mar 3 03:52:31 Got signal: 11 (Segmentation fault).

Sun Mar 3 03:52:31 Backtrace:
0xb046c1 0x55a639 0x55abc2 0x2b23f2b5bbe0 0x2b23f2d8912c 0x2b23f2d895dd 0x2b23f2d86751 0xaf4029 0xaf8624 0xaef01c 0xaf0ee7 0x2b23f2b5377d 0x2b23f3d7425d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xb046c1]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x55a639]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x55abc2]
/lib64/libpthread.so.0 [0x2b23f2b5bbe0]
/lib64/libssl.so.6(ssl3_read_n+0x14c) [0x2b23f2d8912c]
/lib64/libssl.so.6(ssl3_read_bytes+0x3ad) [0x2b23f2d895dd]
/lib64/libssl.so.6 [0x2b23f2d86751]
/usr/bin/mongod(_ZN5mongo6Socket11unsafe_recvEPci+0x9) [0xaf4029]
/usr/bin/mongod(_ZN5mongo6Socket4recvEPci+0xc4) [0xaf8624]
/usr/bin/mongod(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x8c) [0xaef01c]
/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x3f7) [0xaf0ee7]
/lib64/libpthread.so.0 [0x2b23f2b5377d]
/lib64/libc.so.6(clone+0x6d) [0x2b23f3d7425d]



 Comments   
Comment by Eric Milkie [ 04/Mar/13 ]

Hi Dave.
This was fixed in version 2.2.3; see linked ticket for more details.

Comment by Dave Claussen [ 03/Mar/13 ]

FYI: Instead of changing the configuration to use IP addresses, I instead just put the hostnames into /etc/hosts. They don't change frequently, and I don't want primary to ever relinquish control because of a DNS issue, so this was the quickest (short-term) thing for me to do.

Comment by Dave Claussen [ 03/Mar/13 ]

 
# openssl version
OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

(Same as I get with yum info openssl-devel)

Comment by Scott Hernandez (Inactive) [ 03/Mar/13 ]

What version of ssl do you have installed?

I'm not that much of a centos expert but I think you can find out by running "yum info openssl-devel" or just doing a search for the libssl .so files (ls -laht /usr/lib/libssl*)

(I updated the affects version)

Comment by Dave Claussen [ 03/Mar/13 ]

Mongo DB version:

  1. mongod --version
    db version v2.2.2, pdfile version 4.5
    Sun Mar 3 10:19:49 git version: nogitversion

Note: My "affects version" in the original should be 2.2.2, not 2.2.0, that's a typo.

It was built from source:
1. Set up prerequise libraries
2. Build from source
wget http://downloads.mongodb.org/src/mongodb-src-r2.2.2.tar.gz
tar xvfz mongodb-src-*
cd mongodb-src-r2.2.2
scons install -j 9 --64 --ssl --prefix=/tmp/mongodb-linux-2.2.2-x86_64
3. Set up certificate

Comment by Scott Hernandez (Inactive) [ 03/Mar/13 ]

What specific version of mongodb are you using (please post the exact "mongod --version" info), and how was it build and/or deployed (from source, a package, a tgz)? Please also attach the rest of your logs starting before the resolution errors.

As an aside: generally you will want to stick with dns/host-names but setup caching on each host to deal with temporarily upstream resolution issues.

Generated at Thu Feb 08 03:18:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.