Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17019

HA setup doesn't work if member totally and quickly disappears

    XMLWordPrintable

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: 2.6.6
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      We have a problem with our replica set. It's running on three virtual servers and if any of the mongod's goes down, it normally continues working with the rest. However, if any of the servers totally disappears, i.e. won't respond to network traffic at all (if down, or block all outgoing traffic via firewall, or poweroff the server suddenly), all queries to the replica set take 15 seconds extra. Judging from the network traffic, it's due to TCP retransmits.

      This 15 second extra time for every query makes our load balancer think all nodes are down and it shuts down traffic to the whole setup.

      Since using console mongo the other replica set members works fine, we originally posted this as a bug in the node.js driver (https://jira.mongodb.org/browse/NODE-350), but later tried with the PHP driver and were able to reproduce a similar (although not identical) behaviour.

      We also reproduced this problem in our secondary setup in another data center, so this shouldn't be data center specific. Both might be running the same virtualization platform, though, we haven't looked into that yet.

      Any ideas how to go forward with this?

        Attachments

          Activity

            People

            Assignee:
            schwerin Andy Schwerin
            Reporter:
            kvirta Kalle Varisvirta
            Participants:
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: