Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12583

pcursor doesn't check the last used node status from ReplicaSetMonitor

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.0-rc0
    • Affects Version/s: 2.4.9
    • Component/s: Internal Client
    • Labels:
      None
    • Environment:
      sharded cluster with 1 shard: hosta:30000, hostb:30001 and hostb:30002
    • Fully Compatible
    • ALL

      1. issuing secondary reads from mongos
      2. there is a secondary connection pinned ( hostb:30002)
      3. secondary goes to a blackhole (packets are dropped)
      4. the next query will try to reuse the dead secondary despite replica set monitor detecting that the node is unreachable
      5. observe TCP timeout (15 minutes by default)

      Sample mongoS log:

      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[0].ok = true hosta:30000
      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[1].ok = false hostb:30001
      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[2].ok = false hostb:30002
      Fri Jan 31 17:09:05.637 [conn5] trying reconnect to hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] reconnect hostb:30001 failed couldn't connect to server hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30001 socket exception [CONNECT_ERROR] for hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] trying reconnect to hostb:30002
      Fri Jan 31 17:09:15.636 [conn5] reconnect hostb:30002 failed couldn't connect to server hostb:30002
      Fri Jan 31 17:09:15.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30002 socket exception [CONNECT_ERROR] for hostb:30002
      Fri Jan 31 17:09:16.636 [conn5] warning: No primary detected for set shard01
      Fri Jan 31 17:09:16.636 [conn5] User Assertion: 10009:ReplicaSetMonitor no master found for set: shard01
      Fri Jan 31 17:09:16.636 [conn5] dbclient_rs say using secondary or tagged node selection in shard01, read pref is { pref: "secondary only", tags: [ {} ] } (primary : hostb:30001, lastTagged : hostb:30002)
      Fri Jan 31 17:09:16.636 [conn5] dbclient_rs selecting compatible last used node hostb:30002
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] initialized query (lazily) on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing over 1 shards
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
      Fri Jan 31 17:24:47.816 [conn5] Socket recv() errno:110 Connection timed out 10.225.15.113:30002
      

            Assignee:
            mathias@mongodb.com Mathias Stearn
            Reporter:
            alex.komyagin@mongodb.com Alexander Komyagin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: