[SERVER-5806] Mongo Segfaults When Assigning Elastic IP to Running Server Created: 09/May/12  Updated: 15/Aug/12  Resolved: 14/May/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adam Flynn Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 11.10, Linux 3.0.0-14, running on EC2


Issue Links:
Duplicate
duplicates SERVER-4129 server segfault when starting with a ... Closed
Operating System: Linux
Participants:

 Description   

When I bring up one of my MongoDB instances (specifically the secondary) and don't assign it the correct elastic IP right away, when I later do assign it, it segfaults. Get a bunch of log messages where it tries to connect to the primary and errors because its current hostname doesn't match what's in the RS config (expected), but when its hostname changes and it tries, it just dies.

Not the end of the world, but it's pretty annoying, because you have to manually remap the IPs on every server restart.

Last few log entries (the replSet error happens a bunch of times):

Wed May 9 16:20:50 [rsStart] warning: getaddrinfo("ec2-184-72-56-224.us-west-1.compute.amazonaws.com") failed: Name or service not known
Wed May 9 16:20:50 [rsStart] getaddrinfo("ec2-184-72-56-224.us-west-1.compute.amazonaws.com") failed: Name or service not known
Wed May 9 16:20:50 [rsStart] replSet error self not present in the repl set configuration:
Wed May 9 16:20:50 [rsStart] { _id: "wish-friend", version: 1, members: [

{ _id: 1, host: "ec2-50-18-168-160.us-west-1.compute.amazonaws.com:27017", priority: 2.0 }

,

{ _id: 2, host: "ec2-184-72-56-224.us-west-1.compute.amazonaws.com:27017" }

,

{ _id: 99, host: "ec2-50-18-235-86.us-west-1.compute.amazonaws.com:27017", arbiterOnly: true }

] }
Wed May 9 16:20:50 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
Wed May 9 16:20:53 [initandlisten] connection accepted from 127.0.0.1:50347 #58
Wed May 9 16:20:53 [conn58] end connection 127.0.0.1:50347
Wed May 9 16:20:54 [initandlisten] connection accepted from 127.0.0.1:50348 #59
Wed May 9 16:20:54 [conn59] end connection 127.0.0.1:50348
Wed May 9 16:20:56 [initandlisten] connection accepted from 10.170.70.136:51824 #60
Wed May 9 16:21:06 [initandlisten] connection accepted from 10.166.57.162:49390 #61
Wed May 9 16:21:10 [rsStart] trying to contact ec2-50-18-168-160.us-west-1.compute.amazonaws.com:27017
Wed May 9 16:21:10 [rsStart] trying to contact ec2-50-18-235-86.us-west-1.compute.amazonaws.com:27017
Wed May 9 16:21:10 Invalid access at address: 0

Wed May 9 16:21:10 Got signal: 11 (Segmentation fault).

Wed May 9 16:21:10 Backtrace:
0xa90999 0xa90f70 0x7f3a3f8cf060 0x7f3a3f65c4fb 0x7c80cd 0x7c9a47 0x7c9f3d 0xaab3e0 0x7f3a3f8c6efc 0x7f3a3ee6089d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa90999]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0xa90f70]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10060) [0x7f3a3f8cf060]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsC1ERKSs+0xb) [0x7f3a3f65c4fb]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl10loadConfigEv+0x32d) [0x7c80cd]
/usr/bin/mongod(_ZN5mongo11ReplSetImplC2ERNS_14ReplSetCmdlineE+0x3f7) [0x7c9a47]
/usr/bin/mongod(_ZN5mongo13startReplSetsEPNS_14ReplSetCmdlineE+0x5d) [0x7c9f3d]
/usr/bin/mongod(thread_proxy+0x80) [0xaab3e0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f3a3f8c6efc]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f3a3ee6089d]

Logstream::get called in uninitialized state
Wed May 9 16:21:10 ERROR: Client::shutdown not called: rsStart



 Comments   
Comment by Eric Milkie [ 14/May/12 ]

Ok, let us know if it becomes more of a problem for you and we can consider backporting.
Thanks for the report!

Comment by Adam Flynn [ 11/May/12 ]

So far, I've only seen it when restarting my servers if there's a lag between bringing the server up and getting the final IP assigned. Not bothering me enough to need a backport.

Comment by Eric Milkie [ 11/May/12 ]

Hi Adam,
The stack trace leads me to believe that this is SERVER-4129, which I fixed for 2.1.1 but was not backported to 2.0. The list of discovered seeds is accessed by multiple threads and wasn't protected by a mutex.

Is this something that would should backport – how often does this affect you?

Generated at Thu Feb 08 03:09:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.