Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.6.0-rc0
Affects Version/s: 2.4.9
Component/s: Internal Client
Labels:
None
Environment:
sharded cluster with 1 shard: hosta:30000, hostb:30001 and hostb:30002

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

issuing secondary reads from mongos
there is a secondary connection pinned ( hostb:30002)
secondary goes to a blackhole (packets are dropped)
the next query will try to reuse the dead secondary despite replica set monitor detecting that the node is unreachable
observe TCP timeout (15 minutes by default)

Sample mongoS log:

Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[0].ok = true hosta:30000
Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[1].ok = false hostb:30001
Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[2].ok = false hostb:30002
Fri Jan 31 17:09:05.637 [conn5] trying reconnect to hostb:30001
Fri Jan 31 17:09:10.636 [conn5] reconnect hostb:30001 failed couldn't connect to server hostb:30001
Fri Jan 31 17:09:10.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30001 socket exception [CONNECT_ERROR] for hostb:30001
Fri Jan 31 17:09:10.636 [conn5] trying reconnect to hostb:30002
Fri Jan 31 17:09:15.636 [conn5] reconnect hostb:30002 failed couldn't connect to server hostb:30002
Fri Jan 31 17:09:15.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30002 socket exception [CONNECT_ERROR] for hostb:30002
Fri Jan 31 17:09:16.636 [conn5] warning: No primary detected for set shard01
Fri Jan 31 17:09:16.636 [conn5] User Assertion: 10009:ReplicaSetMonitor no master found for set: shard01
Fri Jan 31 17:09:16.636 [conn5] dbclient_rs say using secondary or tagged node selection in shard01, read pref is { pref: "secondary only", tags: [ {} ] } (primary : hostb:30001, lastTagged : hostb:30002)
Fri Jan 31 17:09:16.636 [conn5] dbclient_rs selecting compatible last used node hostb:30002
Fri Jan 31 17:09:16.637 [conn5] [pcursor] initialized query (lazily) on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing over 1 shards
Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
Fri Jan 31 17:24:47.816 [conn5] Socket recv() errno:110 Connection timed out 10.225.15.113:30002

is related to

SERVER-13125 DBClientRS should check that pinned hosts still match read preference

Closed

Assignee:: Mathias Stearn
Reporter:: Alexander Komyagin (Inactive)
Participants:: Alexander Komyagin, Githook User, Mathias Stearn
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Feb 03 2014 05:14:20 PM UTC
Updated:: Jul 11 2016 05:18:32 PM UTC
Resolved:: Feb 05 2014 11:51:56 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates