Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.2.2
Affects Version/s: 2.2.25, 2.2.33
Component/s: MongoDB 3.2
Labels:
- external-user
Environment:
Ubuntu 14.04

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Expected: All pool connections reconnect properly after a new primary is elected

Observed: Some connections in the pool queue queries indefinitely

Details:

Given a three-machine Mongo 3.2.17 replicaset with at least one sharded collection, and connecting to the replicaset from a fourth machine through a local mongos under node-mongodb-native 2.2.33 (and all other versions we tested), we find that when we lose a primary abruptly (e.g. the primary machine or process crashes) though the replicaset elects a new primary just fine and this is reflected in the mongos logs, node-mongodb-native ends up with some connections in its pool hung indefinitely, queueing queries without either completing them or returning errors.

Here is a test script that will demonstrate the problem when run against that configuration:
https://gist.github.com/brettkiefer/82f65b5a3795caaf66a3dfd3b4c3f2a1
(also attached as repeatCounts.js)

The surest way to reproduce the issue to run something like that script and kill the network abruptly on the replicaset primary, e.g. with `sudo ifconfig eth0 down`.

We have been unable to find any mongodb, mongos, or node-mongodb-native configuration options that make this behave as expected (that is, the bad connections to mongos reconnect), and have resorted to detecting this condition in application code by looking for queries stacking up or hanging, but this takes longer to detect than we would like, leading to a partial or complete outage until the bad connections are detected.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

repeatCounts.js
2 kB
Oct 23 2017 04:37:45 PM UTC

depends on

NODE-1290 SDAM Refactor

Development Complete

related to

NODE-1340 Reconnecting to Mongos proxy server never fires the reconnect event

Closed

Assignee:: Matt Broadstone
Reporter:: Brett Kiefer
Reviewers:: None
Votes:: 4 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Oct 23 2017 04:38:56 PM UTC
Updated:: Oct 29 2023 01:44:19 PM UTC
Resolved:: Apr 15 2019 02:11:01 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates