Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Networking, Sharding
Labels:
None

Assigned Teams:

Service Arch
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Due to the split between the legacy connection pool, and the various NetworkInterfaceASIO connection pools, information about network errors is not fully exploited.

Consider the following scenario:

In a sharded cluster, one shard is restarted. A client runs a find command against mongos, which fails as a bad connection is used. Mongos then correctly dumps all the connections it has to the shard. The client retries the find and it works.

However, if the client then runs a 'count command' it will then fail, since bad connections to the downed shard are still present in the legacy connection pool.

The fix here is to drop all pooled connections to a bad host in ALL pools when a network error is detected.

I have also attached a jstest that reproduces the problem.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

bad_connections_downed_shard.js
Mar 01 2016 03:44:47 PM UTC
1 kB
Adam Midvidy

Assignee:: [DO NOT USE] Backlog - Service Architecture
Reporter:: Adam Midvidy (Inactive)
Participants:: [DO NOT USE] Backlog - Service Architecture, Adam Midvidy, Mira Carey, Scott Hernandez
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 01 2016 03:44:29 PM UTC
Updated:: Dec 06 2022 04:32:01 AM UTC
Resolved:: Feb 16 2018 04:07:47 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates