[SERVER-13500] Changing replica set configuration can crash running members Created: 06/Apr/14  Updated: 11/Jul/16  Resolved: 10/Apr/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.0-rc3
Fix Version/s: 2.6.1, 2.7.0

Type: Bug Priority: Critical - P2
Reporter: Akshay Kumar Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-13718 Secondary crashes on replSetUpdatePos... Closed
Operating System: Linux
Backport Completed:
Participants:

 Description   
Issue Status as of April 15, 2014

ISSUE SUMMARY
Initializing a new replica set configuration (especially the case of a member removal) can occasionally cause a crash on one or multiple running members of the replica set. The members which incur a crash may be safely restarted.

USER IMPACT
Crashing replica set members affect quorum and can in the worst case lead to no primary and unavailability of the replica set.

WORKAROUNDS
If a member is being removed, shutting it down prior to its removal from the replica set configuration will reduce (but not eliminate) the chance of a crash. For the case where a member is not being removed, no known workaround is available.

RESOLUTION
The replica set handshake handler was fixed to be resilient to the receipt of handshakes while initializing a new replica set configuration.

AFFECTED VERSIONS
Version 2.6.0 is affected by this bug.

PATCHES
The patch is included in the 2.6.1 production release.

Original description

If in the process of removing a member from a replica set, the member is removed from the configuration while still running, a mongod may crash.

To avoid this bug, please follow the documented procedure below for removing a node and shut down the member before removing it from the configuration:
http://docs.mongodb.org/manual/tutorial/remove-replica-set-member/

Example of what a crash may look like:

2014-04-07T01:51:56.698+0000 [SyncSourceFeedbackThread] SEVERE: Invalid access at address: 0xa8
2014-04-07T01:51:56.744+0000 [SyncSourceFeedbackThread] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11bd301 0x11bc6de 0x11bc7cf 0x32d740f710 0xeacaf6 0xeb19e8 0x1145332 0x1201c99 0x32d74079d1 0x32d70e8b6d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11bd301]
 /usr/bin/mongod() [0x11bc6de]
 /usr/bin/mongod() [0x11bc7cf]
 /lib64/libpthread.so.0() [0x32d740f710]
 /usr/bin/mongod(_ZN5mongo18SyncSourceFeedback13replHandshakeEv+0xb86) [0xeacaf6]
 /usr/bin/mongod(_ZN5mongo18SyncSourceFeedback3runEv+0x9b8) [0xeb19e8]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x1145332]
 /usr/bin/mongod() [0x1201c99]
 /lib64/libpthread.so.0() [0x32d74079d1]
 /lib64/libc.so.6(clone+0x6d) [0x32d70e8b6d]



 Comments   
Comment by Githook User [ 10/Apr/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-13500 prevent syncSourceFeedback segfault by not allowing NULL members to be added to _members map

(cherry picked from commit ba3823f2a7c08a022bebbe8accebba8893582e09)
Branch: v2.6
https://github.com/mongodb/mongo/commit/be1905c24c7e5ea258e537fbf0d2c502c4fc6de2

Comment by Githook User [ 09/Apr/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-13500 prevent syncSourceFeedback segfault by not allowing NULL members to be added to _members map
Branch: master
https://github.com/mongodb/mongo/commit/ba3823f2a7c08a022bebbe8accebba8893582e09

Generated at Thu Feb 08 03:31:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.