[SERVER-12583] pcursor doesn't check the last used node status from ReplicaSetMonitor Created: 03/Feb/14  Updated: 11/Jul/16  Resolved: 05/Feb/14

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.4.9
Fix Version/s: 2.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

sharded cluster with 1 shard: hosta:30000, hostb:30001 and hostb:30002


Issue Links:
Depends
Related
is related to SERVER-13125 DBClientRS should check that pinned h... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   
  1. issuing secondary reads from mongos
  2. there is a secondary connection pinned ( hostb:30002)
  3. secondary goes to a blackhole (packets are dropped)
  4. the next query will try to reuse the dead secondary despite replica set monitor detecting that the node is unreachable
  5. observe TCP timeout (15 minutes by default)

Sample mongoS log:

Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[0].ok = true hosta:30000
Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[1].ok = false hostb:30001
Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[2].ok = false hostb:30002
Fri Jan 31 17:09:05.637 [conn5] trying reconnect to hostb:30001
Fri Jan 31 17:09:10.636 [conn5] reconnect hostb:30001 failed couldn't connect to server hostb:30001
Fri Jan 31 17:09:10.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30001 socket exception [CONNECT_ERROR] for hostb:30001
Fri Jan 31 17:09:10.636 [conn5] trying reconnect to hostb:30002
Fri Jan 31 17:09:15.636 [conn5] reconnect hostb:30002 failed couldn't connect to server hostb:30002
Fri Jan 31 17:09:15.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30002 socket exception [CONNECT_ERROR] for hostb:30002
Fri Jan 31 17:09:16.636 [conn5] warning: No primary detected for set shard01
Fri Jan 31 17:09:16.636 [conn5] User Assertion: 10009:ReplicaSetMonitor no master found for set: shard01
Fri Jan 31 17:09:16.636 [conn5] dbclient_rs say using secondary or tagged node selection in shard01, read pref is { pref: "secondary only", tags: [ {} ] } (primary : hostb:30001, lastTagged : hostb:30002)
Fri Jan 31 17:09:16.636 [conn5] dbclient_rs selecting compatible last used node hostb:30002
Fri Jan 31 17:09:16.637 [conn5] [pcursor] initialized query (lazily) on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing over 1 shards
Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
Fri Jan 31 17:24:47.816 [conn5] Socket recv() errno:110 Connection timed out 10.225.15.113:30002



 Comments   
Comment by Githook User [ 05/Feb/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-12583 DBClientRS shouldn't use hosts the RSM thinks are down.

Also commented and ordered checks from cheapest to most expensive in
DBClientRS::checkLastHost.
Branch: master
https://github.com/mongodb/mongo/commit/f2cf9f3ee5efce282db31e4f5de4293fa5e2382a

Generated at Thu Feb 08 03:28:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.