Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-1166

Failure to reconnect through mongos after abrupt replicaset primary failure

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.2.25, 2.2.33
    • Fix Version/s: 3.2.2
    • Component/s: MongoDB 3.2
    • Labels:
    • Environment:
      Ubuntu 14.04

      Description

      Expected: All pool connections reconnect properly after a new primary is elected

      Observed: Some connections in the pool queue queries indefinitely

      Details:

      Given a three-machine Mongo 3.2.17 replicaset with at least one sharded collection, and connecting to the replicaset from a fourth machine through a local mongos under node-mongodb-native 2.2.33 (and all other versions we tested), we find that when we lose a primary abruptly (e.g. the primary machine or process crashes) though the replicaset elects a new primary just fine and this is reflected in the mongos logs, node-mongodb-native ends up with some connections in its pool hung indefinitely, queueing queries without either completing them or returning errors.

      Here is a test script that will demonstrate the problem when run against that configuration:
      https://gist.github.com/brettkiefer/82f65b5a3795caaf66a3dfd3b4c3f2a1
      (also attached as repeatCounts.js)

      The surest way to reproduce the issue to run something like that script and kill the network abruptly on the replicaset primary, e.g. with `sudo ifconfig eth0 down`.

      We have been unable to find any mongodb, mongos, or node-mongodb-native configuration options that make this behave as expected (that is, the bad connections to mongos reconnect), and have resorted to detecting this condition in application code by looking for queries stacking up or hanging, but this takes longer to detect than we would like, leading to a partial or complete outage until the bad connections are detected.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              matt.broadstone Matt Broadstone
              Reporter:
              brettkiefer Brett Kiefer
              Participants:
              Votes:
              4 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: