Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13500

Changing replica set configuration can crash running members

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.6.1, 2.7.0
    • Affects Version/s: 2.6.0-rc3
    • Component/s: Replication
    • Labels:
      None
    • Linux

      Issue Status as of April 15, 2014

      ISSUE SUMMARY
      Initializing a new replica set configuration (especially the case of a member removal) can occasionally cause a crash on one or multiple running members of the replica set. The members which incur a crash may be safely restarted.

      USER IMPACT
      Crashing replica set members affect quorum and can in the worst case lead to no primary and unavailability of the replica set.

      WORKAROUNDS
      If a member is being removed, shutting it down prior to its removal from the replica set configuration will reduce (but not eliminate) the chance of a crash. For the case where a member is not being removed, no known workaround is available.

      RESOLUTION
      The replica set handshake handler was fixed to be resilient to the receipt of handshakes while initializing a new replica set configuration.

      AFFECTED VERSIONS
      Version 2.6.0 is affected by this bug.

      PATCHES
      The patch is included in the 2.6.1 production release.

      Original description

      If in the process of removing a member from a replica set, the member is removed from the configuration while still running, a mongod may crash.

      To avoid this bug, please follow the documented procedure below for removing a node and shut down the member before removing it from the configuration:
      http://docs.mongodb.org/manual/tutorial/remove-replica-set-member/

      Example of what a crash may look like:

      2014-04-07T01:51:56.698+0000 [SyncSourceFeedbackThread] SEVERE: Invalid access at address: 0xa8
      2014-04-07T01:51:56.744+0000 [SyncSourceFeedbackThread] SEVERE: Got signal: 11 (Segmentation fault).
      Backtrace:0x11bd301 0x11bc6de 0x11bc7cf 0x32d740f710 0xeacaf6 0xeb19e8 0x1145332 0x1201c99 0x32d74079d1 0x32d70e8b6d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11bd301]
       /usr/bin/mongod() [0x11bc6de]
       /usr/bin/mongod() [0x11bc7cf]
       /lib64/libpthread.so.0() [0x32d740f710]
       /usr/bin/mongod(_ZN5mongo18SyncSourceFeedback13replHandshakeEv+0xb86) [0xeacaf6]
       /usr/bin/mongod(_ZN5mongo18SyncSourceFeedback3runEv+0x9b8) [0xeb19e8]
       /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x1145332]
       /usr/bin/mongod() [0x1201c99]
       /lib64/libpthread.so.0() [0x32d74079d1]
       /lib64/libc.so.6(clone+0x6d) [0x32d70e8b6d]
      

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            akshay@mongodb.com Akshay Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: