Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8833

Segmentation fault after replSet relinquishing primary state, network/SSL related

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: 2.2.2
    • Fix Version/s: None
    • Component/s: Stability
    • Environment:
    • Operating System:
      Linux
    • Steps To Reproduce:
      Hide

      1. Set up a SSL mongo instnace with primary, secondary, and arbiter with host names (maybe in /etc/hosts).
      2. Run the server
      3. Eliminate the host names
      4. Probably need to be doing some requests against the server too.

      Note: I'm obviously guessing that the hostname issue is the cause, perhaps it's something completely different. But normally we run without issues for days on end without this happening.

      Show
      1. Set up a SSL mongo instnace with primary, secondary, and arbiter with host names (maybe in /etc/hosts). 2. Run the server 3. Eliminate the host names 4. Probably need to be doing some requests against the server too. Note: I'm obviously guessing that the hostname issue is the cause, perhaps it's something completely different. But normally we run without issues for days on end without this happening.

      Description

      We use Cloudflare for our DNS. Their DNS went down this morning at 9:47 UTC as described here (http://techcrunch.com/2013/03/03/cloudflare-is-down-due-to-dns-outage-taking-down-785000-websites-including-4chan-wikileaks-metallica-com/).

      That caused our primary instance of MongoDB relinquishing primary state because we currently use hostnames in our replica set [I'll change this in the future so it does not].

      That I would expect. But it also seg faulted: (I've changed the host names to be more descriptive)

      Sun Mar 3 03:52:12 [rsHealthPoll] getaddrinfo("secondary.ourhost.com") failed: Name or service not known
      Sun Mar 3 03:52:18 [rsHealthPoll] getaddrinfo("arbiter.ourhost.com") failed: Name or service not known
      Sun Mar 3 03:52:22 [rsHealthPoll] getaddrinfo("secondary.ourhost.com") failed: Name or service not known
      Sun Mar 3 03:52:28 [rsHealthPoll] getaddrinfo("arbiter.ourhost.com") failed: Name or service not known
      Sun Mar 3 03:52:28 [rsMgr] can't see a majority of the set, relinquishing primary
      Sun Mar 3 03:52:28 [rsMgr] replSet relinquishing primary state
      Sun Mar 3 03:52:28 [rsMgr] replSet SECONDARY
      Sun Mar 3 03:52:28 [rsMgr] replSet closing client sockets after relinquishing primary
      Sun Mar 3 03:52:31 Invalid access at address: 0xf8 from thread: conn249138

      Sun Mar 3 03:52:31 Got signal: 11 (Segmentation fault).

      Sun Mar 3 03:52:31 Backtrace:
      0xb046c1 0x55a639 0x55abc2 0x2b23f2b5bbe0 0x2b23f2d8912c 0x2b23f2d895dd 0x2b23f2d86751 0xaf4029 0xaf8624 0xaef01c 0xaf0ee7 0x2b23f2b5377d 0x2b23f3d7425d
      /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xb046c1]
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x55a639]
      /usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x55abc2]
      /lib64/libpthread.so.0 [0x2b23f2b5bbe0]
      /lib64/libssl.so.6(ssl3_read_n+0x14c) [0x2b23f2d8912c]
      /lib64/libssl.so.6(ssl3_read_bytes+0x3ad) [0x2b23f2d895dd]
      /lib64/libssl.so.6 [0x2b23f2d86751]
      /usr/bin/mongod(_ZN5mongo6Socket11unsafe_recvEPci+0x9) [0xaf4029]
      /usr/bin/mongod(_ZN5mongo6Socket4recvEPci+0xc4) [0xaf8624]
      /usr/bin/mongod(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x8c) [0xaef01c]
      /usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x3f7) [0xaf0ee7]
      /lib64/libpthread.so.0 [0x2b23f2b5377d]
      /lib64/libc.so.6(clone+0x6d) [0x2b23f3d7425d]

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: