Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4094

better mongos handling of state where connections can be established but mongod unresponsive

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Sharding
    • ALL

      It's possible to get into a hung state in mongos given a sharded cluster with a replica set shard. If the shard primary continues to allow connections but does not respond to other requests, failover will occur and a new primary will be elected normally, but sharded queries via mongos will block and not return.

      This failure mode has been observed on EC2.

      Can reproduce locally by :
      1) Running the script included (sets up a sharded cluster with sharded collection)
      2) Running mongo localhost:31000 to connect to the mongos
      3) > use foo
      4) > db.bar.find().itcount()
      5) Stopping all data transfer for connections at the primary via iptables :
      sudo /sbin/iptables -A INPUT -i lo -p tcp -m tcp --dport <mongod primary port> -m conntrack --ctstate ESTABLISHED -j DROP
      6) (wait for failover)
      7) > db.bar.find().itcount()

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            greg_10gen Greg Studer
            Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: