Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12583

pcursor doesn't check the last used node status from ReplicaSetMonitor

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.4.9
    • Fix Version/s: 2.6.0-rc0
    • Component/s: Internal Client
    • Labels:
      None
    • Environment:
      sharded cluster with 1 shard: hosta:30000, hostb:30001 and hostb:30002
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      1. issuing secondary reads from mongos
      2. there is a secondary connection pinned ( hostb:30002)
      3. secondary goes to a blackhole (packets are dropped)
      4. the next query will try to reuse the dead secondary despite replica set monitor detecting that the node is unreachable
      5. observe TCP timeout (15 minutes by default)

      Sample mongoS log:

      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[0].ok = true hosta:30000
      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[1].ok = false hostb:30001
      Fri Jan 31 17:09:05.637 [conn5] dbclient_rs nodes[2].ok = false hostb:30002
      Fri Jan 31 17:09:05.637 [conn5] trying reconnect to hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] reconnect hostb:30001 failed couldn't connect to server hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30001 socket exception [CONNECT_ERROR] for hostb:30001
      Fri Jan 31 17:09:10.636 [conn5] trying reconnect to hostb:30002
      Fri Jan 31 17:09:15.636 [conn5] reconnect hostb:30002 failed couldn't connect to server hostb:30002
      Fri Jan 31 17:09:15.636 [conn5] ReplicaSetMonitor::_checkConnection: caught exception hostb:30002 socket exception [CONNECT_ERROR] for hostb:30002
      Fri Jan 31 17:09:16.636 [conn5] warning: No primary detected for set shard01
      Fri Jan 31 17:09:16.636 [conn5] User Assertion: 10009:ReplicaSetMonitor no master found for set: shard01
      Fri Jan 31 17:09:16.636 [conn5] dbclient_rs say using secondary or tagged node selection in shard01, read pref is { pref: "secondary only", tags: [ {} ] } (primary : hostb:30001, lastTagged : hostb:30002)
      Fri Jan 31 17:09:16.636 [conn5] dbclient_rs selecting compatible last used node hostb:30002
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] initialized query (lazily) on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing over 1 shards
      Fri Jan 31 17:09:16.637 [conn5] [pcursor] finishing on shard shard01:shard01/hosta:30000,hostb:30001,hostb:30002, current connection state is { state: { conn: "shard01/hosta:30000,hostb:30001,hostb:30002", vinfo: "shard01:shard01/hosta:30000,hostb:30001,hostb:30002", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
      Fri Jan 31 17:24:47.816 [conn5] Socket recv() errno:110 Connection timed out 10.225.15.113:30002

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              redbeard0531 Mathias Stearn
              Reporter:
              alex.komyagin Alexander Komyagin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: