[SERVER-8643] Calling rs.remove() causes a segmentation fault when using SSL Created: 21/Feb/13  Updated: 08/Mar/13  Resolved: 21/Feb/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ryan Bunker Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon Linux 64 bit


Issue Links:
Duplicate
duplicates SERVER-5487 Seg fault when shutting down replica ... Closed
Operating System: Linux
Steps To Reproduce:

1. Create a replica set and make sure all servers are using SSL
2. Attempt to remove a member of the replica set using rs.remove()
3. The primary member should segmentation fault

Participants:

 Description   

When removing a replica set member using rs.remove() the primary member segmentation faults. This only happens when using SSL. If SSL is disabled then the member can be removed without issue.

Here is a log showing the seg fault:
Mon Feb 18 18:35:53 [journal] _groupCommit
Mon Feb 18 18:35:53 [journal] _groupCommit upgrade
Mon Feb 18 18:35:53 [journal] journal REMAPPRIVATEVIEW
Mon Feb 18 18:35:53 [journal] journal REMAPPRIVATEVIEW done startedAt: 3 n:2 0ms
Mon Feb 18 18:35:53 [journal] groupCommit end
Mon Feb 18 18:35:53 [rsHealthPoll] Sending command

{ replSetHeartbeat: "triplink", v: 75237, pv: 1, checkEmpty: false, from: "10.4.232.178:27017" }

to 10.4.230.134:27017 with $auth: {}
Mon Feb 18 18:35:54 [rsHealthPoll] Sending command

{ replSetHeartbeat: "triplink", v: 75237, pv: 1, checkEmpty: false, from: "10.4.232.178:27017" }

to 10.4.228.114:27017 with $auth: {}
Mon Feb 18 18:35:54 [conn18] runQuery called local.$cmd { count: "system.replset", query: {}, fields: {} }
Mon Feb 18 18:35:54 [conn18] run command local.$cmd { count: "system.replset", query: {}, fields: {} }
Mon Feb 18 18:35:54 [conn18] command local.$cmd command: { count: "system.replset", query: {}, fields: {} } ntoreturn:1 keyUpdates:0 locks(micros) r:28 reslen:48 0ms
Mon Feb 18 18:35:54 [conn18] runQuery called local.system.replset {}
Mon Feb 18 18:35:54 [conn18] query local.system.replset ntoreturn:1 keyUpdates:0 locks(micros) r:46 nreturned:1 reslen:208 0ms
Mon Feb 18 18:35:54 [conn18] runQuery called admin.$cmd { replSetReconfig: { _id: "triplink", version: 75238, members: [

{ _id: 1, host: "10.4.232.178:27017" }

,

{ _id: 3, host: "10.4.228.114:27017" }

] } }
Mon Feb 18 18:35:54 [conn18] run command admin.$cmd { replSetReconfig: { _id: "triplink", version: 75238, members: [

{ _id: 1, host: "10.4.232.178:27017" }

,

{ _id: 3, host: "10.4.228.114:27017" }

] } }
Mon Feb 18 18:35:54 [conn18] command: { replSetReconfig: { _id: "triplink", version: 75238, members: [

{ _id: 1, host: "10.4.232.178:27017" }

,

{ _id: 3, host: "10.4.228.114:27017" }

] } }
Mon Feb 18 18:35:54 [conn18] replSet replSetReconfig config object parses ok, 2 members specified
Mon Feb 18 18:35:54 [conn18] getMyAddrs(): [127.0.0.1] [10.4.232.178] [::1] [fe80::47b:33ff:fec4:80c9%eth0]
Mon Feb 18 18:35:54 [conn18] getallIPs("10.4.228.114"): [10.4.228.114]
Mon Feb 18 18:35:54 BackgroundJob starting: ConnectBG
Mon Feb 18 18:35:54 [conn18] Sending command

{ replSetHeartbeat: "triplink", v: -1, pv: 1, checkEmpty: false, from: "" }

to 10.4.228.114:27017 with $auth: {}
Mon Feb 18 18:35:54 [conn18] replSet replSetReconfig [2]
Mon Feb 18 18:35:54 [conn18] replSet info saving a newer config version to local.system.replset
Mon Feb 18 18:35:54 [conn27] CoveredIndexMatcher::matches() {} 2:25f0 0
Mon Feb 18 18:35:54 [conn27] Matcher::matches() { ts: Timestamp 1361212554000|1, h: 4260825151744352383, v: 2, op: "n", ns: "", o:

{ msg: "Reconfig set", version: 75238 }

}
Mon Feb 18 18:35:54 [conn27] CoveredIndexMatcher _docMatcher->matches() returns 1
Mon Feb 18 18:35:54 [conn27] getmore local.oplog.rs query: { ts:

{ $gte: new Date(5845321129736011777) }

} cursorid:2107813182278322768 ntoreturn:0 keyUpdates:0 numYields: 1 locks(micros) r:379 nreturned:1 reslen:117 4684ms
Mon Feb 18 18:35:54 [conn21] CoveredIndexMatcher::matches() {} 2:25f0 0
Mon Feb 18 18:35:54 [conn21] Matcher::matches() { ts: Timestamp 1361212554000|1, h: 4260825151744352383, v: 2, op: "n", ns: "", o:

{ msg: "Reconfig set", version: 75238 }

}
Mon Feb 18 18:35:54 [conn21] CoveredIndexMatcher _docMatcher->matches() returns 1
Mon Feb 18 18:35:54 [conn21] getmore local.oplog.rs query: { ts:

{ $gte: new Date(5845321129736011777) }

} cursorid:2322837176604495716 ntoreturn:0 keyUpdates:0 locks(micros) r:198 nreturned:1 reslen:117 2410ms
Mon Feb 18 18:35:54 [conn18] replSet saveConfigLocally done
Mon Feb 18 18:35:54 [conn18] replSet attempting to relinquish
Mon Feb 18 18:35:54 [conn18] replSet relinquishing primary state
Mon Feb 18 18:35:54 [conn18] replSet SECONDARY
Mon Feb 18 18:35:54 [conn18] replSet closing client sockets after relinquishing primary
Mon Feb 18 18:35:54 Invalid access at address: 0x1d8 from thread: conn21

Mon Feb 18 18:35:54 Got signal: 11 (Segmentation fault).

Mon Feb 18 18:35:54 Backtrace:
0x9a0946 0x57d47d 0x57d7e7 0x7f2748a71500 0x7f274882ac1f 0x7f274882bb7d 0x7f2748828380 0x992a79 0x9961ef 0x98e89a 0x98fd87 0x7f2748a69851 0x7f274781911d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0x9a0946]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x39d) [0x57d47d]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x247) [0x57d7e7]
/lib64/libpthread.so.0(+0xf500) [0x7f2748a71500]
/usr/lib64/libssl.so.10(ssl3_send_alert+0x4f) [0x7f274882ac1f]
/usr/lib64/libssl.so.10(ssl3_read_bytes+0x21d) [0x7f274882bb7d]
/usr/lib64/libssl.so.10(+0x22380) [0x7f2748828380]
/usr/bin/mongod(_ZN5mongo6Socket11unsafe_recvEPci+0x9) [0x992a79]
/usr/bin/mongod(_ZN5mongo6Socket4recvEPci+0x2f) [0x9961ef]
/usr/bin/mongod(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x8a) [0x98e89a]
/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x437) [0x98fd87]
/lib64/libpthread.so.0(+0x7851) [0x7f2748a69851]
/lib64/libc.so.6(clone+0x6d) [0x7f274781911d]

And here is the shell output after issuing the command:
triplink:PRIMARY> rs.remove("10.4.230.134:27017")
Mon Feb 18 18:35:54 DBClientCursor::init call() failed
Mon Feb 18 18:35:54 query failed : admin.$cmd { replSetReconfig: { _id: "triplink", version: 75238, members: [

{ _id: 1, host: "10.4.232.178:27017" }

,

{ _id: 3, host: "10.4.228.114:27017" }

] } } to: 127.0.0.1:27017
Mon Feb 18 18:35:54 Error: error doing query: failed src/mongo/shell/collection.js:155
Mon Feb 18 18:35:54 trying reconnect to 127.0.0.1:27017
Mon Feb 18 18:35:54 reconnect 127.0.0.1:27017 ok
Mon Feb 18 18:35:54 SSL Error ret: -1 err: 1 error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure
Mon Feb 18 18:35:54 Socket say send() errno:0 Success 127.0.0.1:27017
> exit



 Comments   
Comment by Eric Milkie [ 21/Feb/13 ]

This bug is fixed in 2.2.3 and 2.3.2; please see the linked duplicate JIRA issue for more details.

Generated at Thu Feb 08 03:17:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.