better mongos handling of state where connections can be established but mongod unresponsive

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Sharding
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      It's possible to get into a hung state in mongos given a sharded cluster with a replica set shard. If the shard primary continues to allow connections but does not respond to other requests, failover will occur and a new primary will be elected normally, but sharded queries via mongos will block and not return.

      This failure mode has been observed on EC2.

      Can reproduce locally by :
      1) Running the script included (sets up a sharded cluster with sharded collection)
      2) Running mongo localhost:31000 to connect to the mongos
      3) > use foo
      4) > db.bar.find().itcount()
      5) Stopping all data transfer for connections at the primary via iptables :
      sudo /sbin/iptables -A INPUT -i lo -p tcp -m tcp --dport <mongod primary port> -m conntrack --ctstate ESTABLISHED -j DROP
      6) (wait for failover)
      7) > db.bar.find().itcount()

              Assignee:
              [DO NOT USE] Backlog - Sharding Team
              Reporter:
              Greg Studer (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: