Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-80575

Flow control is engaged and the sustainer point is not moving

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Server Triage
    • ALL

      In a 3 node replica set (no sharding):

      at present 

      p1 - secondary

      p2 - primary

      dr - secondary

      When I connect using mongsh on each of those nodes -

      On p1 : mongosh <--all-envs> --port 7253

      takes close 8-10 seconds to prompt mongoshell

      rs.status()

        members: [
          {
            _id: 0,
            name: 'p1:7253',
            health: 1,
            state: 2,
            stateStr: 'SECONDARY',
            uptime: 367399,
            optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") },
            optimeDate: ISODate("2023-08-28T07:15:07.000Z"),
            lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: 'Could not find member to sync from',
            configVersion: 19,
            configTerm: 47,
            self: true,
            lastHeartbeatMessage: ''
          },
          {
            _id: 1,
            name: 'p2:7253',
            health: 0,
            state: 8,
            stateStr: '(not reachable/healthy)',
            uptime: 0,
            optime: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
            optimeDurable: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
            optimeDate: ISODate("1970-01-01T00:00:00.000Z"),
            optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"),
            lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastHeartbeat: ISODate("2023-08-28T07:15:29.974Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:19:31.951Z"),
            pingMs: Long("0"),
            lastHeartbeatMessage: "Couldn't get a connection within the time limit",
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 47
          },
          {
            _id: 2,
            name: 'dr:7253',
            health: 0,
            state: 8,
            stateStr: '(not reachable/healthy)',
            uptime: 0,
            optime: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
            optimeDurable: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
            optimeDate: ISODate("1970-01-01T00:00:00.000Z"),
            optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"),
            lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastHeartbeat: ISODate("2023-08-31T03:50:33.417Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:19:34.226Z"),
            pingMs: Long("80"),
            lastHeartbeatMessage: "Couldn't get a connection within the time limit",
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 47
          },
      
      

      mongo7253 [direct: secondary] test> rs.printSecondaryReplicationInfo()
      source: p1lc1mon029.ger.corp.intel.com:7253

      {   syncedTo: 'Mon Aug 28 2023 07:15:07 GMT+0000 (Coordinated Universal Time)',   replLag: '0 secs (0 hrs) behind the freshest member (no primary available at the moment)' }


      source: p2:17253

      { 'no replication info, yet.  State': '(not reachable/healthy)' }


      source: dr:7253

      { 'no replication info, yet.  State': '(not reachable/healthy)' }

      mongo7253 [direct: secondary] test>

      mongo7253 [direct: secondary] test> rs.printReplicationInfo()
      actual oplog size
      '2048 MB'

      configured oplog size
      '2048 MB'

      log length start to end
      '45966145 secs (12768.37 hrs)'

      oplog first event time
      'Mon Mar 14 2022 06:52:42 GMT+0000 (Coordinated Universal Time)'

      oplog last event time
      'Mon Aug 28 2023 07:15:07 GMT+0000 (Coordinated Universal Time)'

      now
      'Thu Aug 31 2023 08:22:47 GMT+0000 (Coordinated Universal Time)'
      mongo7253 [direct: secondary] test>

      On p2 : mongosh <--all-envs> --port 7253

      immediately gives mongo shell command 

      mongo7253 [direct: primary] test> show dbs
      admin          336.00 KiB
      config           1.29 MiB
      XYZ             9.05 MiB
      local          132.86 MiB
      mongo7253 [direct: primary] test>

      rs.status()

        members: [
          {
            _id: 0,
            name: 'p1:7253',
            health: 1,
            state: 2,
            stateStr: 'SECONDARY',
            uptime: 263345,
            optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") },
            optimeDurable: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") },
            optimeDate: ISODate("2023-08-28T07:15:07.000Z"),
            optimeDurableDate: ISODate("2023-08-28T07:15:07.000Z"),
            lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastHeartbeat: ISODate("2023-08-31T08:24:36.250Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:24:36.228Z"),
            pingMs: Long("1042"),
            lastHeartbeatMessage: '',
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 47
          },
          {
            _id: 1,
            name: 'p2:7253',
            health: 1,
            state: 1,
            stateStr: 'PRIMARY',
            uptime: 263355,
            optime: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") },
            optimeDate: ISODate("2023-08-31T08:24:30.000Z"),
            lastAppliedWallTime: ISODate("2023-08-31T08:24:30.694Z"),
            lastDurableWallTime: ISODate("2023-08-31T08:24:30.694Z"),
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            electionTime: Timestamp(\{ t: 1693206944, i: 1 }),
            electionDate: ISODate("2023-08-28T07:15:44.000Z"),
            configVersion: 19,
            configTerm: 50,
            self: true,
            lastHeartbeatMessage: ''
          },
          {
            _id: 2,
            name: 'dr:7253',
            health: 1,
            state: 2,
            stateStr: 'SECONDARY',
            uptime: 18194,
            optime: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") },
            optimeDurable: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") },
            optimeDate: ISODate("2023-08-31T08:24:30.000Z"),
            optimeDurableDate: ISODate("2023-08-31T08:24:30.000Z"),
            lastAppliedWallTime: ISODate("2023-08-31T08:24:30.694Z"),
            lastDurableWallTime: ISODate("2023-08-31T08:24:30.694Z"),
            lastHeartbeat: ISODate("2023-08-31T08:24:34.685Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:24:35.853Z"),
            pingMs: Long("82"),
            lastHeartbeatMessage: '',
            syncSourceHost: 'p2:7253',
            syncSourceId: 1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 50
          },
      

      mongo7253 [direct: primary] test> rs.printSecondaryReplicationInfo()
      source: p1:7253

      {   syncedTo: 'Mon Aug 28 2023 07:15:07 GMT+0000 (Coordinated Universal Time)',   replLag: '-263413 secs (-73.17 hrs) behind the primary ' }


      source: dr:7253

      {   syncedTo: 'Thu Aug 31 2023 08:25:20 GMT+0000 (Coordinated Universal Time)',   replLag: '0 secs (0 hrs) behind the primary ' }

      mongo7253 [direct: primary] test>

      mongo7253 [direct: primary] test> rs.printReplicationInfo()
      actual oplog size
      '2048 MB'

      configured oplog size
      '2048 MB'

      log length start to end
      '46229587 secs (12841.55 hrs)'

      oplog first event time
      'Mon Mar 14 2022 06:52:53 GMT+0000 (Coordinated Universal Time)'

      oplog last event time
      'Thu Aug 31 2023 08:26:00 GMT+0000 (Coordinated Universal Time)'

      now
      'Thu Aug 31 2023 08:26:03 GMT+0000 (Coordinated Universal Time)'
      mongo7253 [direct: primary] test>

      On dr: mongosh <--all-envs> --port 7253

        members: [
          {
            _id: 0,
            name: 'p1:7253',
            health: 1,
            state: 2,
            stateStr: 'SECONDARY',
            uptime: 18319,
            optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") },
            optimeDurable: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") },
            optimeDate: ISODate("2023-08-28T07:15:07.000Z"),
            optimeDurableDate: ISODate("2023-08-28T07:15:07.000Z"),
            lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"),
            lastHeartbeat: ISODate("2023-08-31T08:26:33.995Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:26:37.495Z"),
            pingMs: Long("959"),
            lastHeartbeatMessage: '',
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 47
          },
          {
            _id: 1,
            name: 'p2:7253',
            health: 1,
            state: 1,
            stateStr: 'PRIMARY',
            uptime: 18320,
            optime: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") },
            optimeDurable: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") },
            optimeDate: ISODate("2023-08-31T08:26:28.000Z"),
            optimeDurableDate: ISODate("2023-08-31T08:26:28.000Z"),
            lastAppliedWallTime: ISODate("2023-08-31T08:26:28.836Z"),
            lastDurableWallTime: ISODate("2023-08-31T08:26:28.836Z"),
            lastHeartbeat: ISODate("2023-08-31T08:26:36.658Z"),
            lastHeartbeatRecv: ISODate("2023-08-31T08:26:37.492Z"),
            pingMs: Long("82"),
            lastHeartbeatMessage: '',
            syncSourceHost: '',
            syncSourceId: -1,
            infoMessage: '',
            electionTime: Timestamp(\{ t: 1693206944, i: 1 }),
            electionDate: ISODate("2023-08-28T07:15:44.000Z"),
            configVersion: 19,
            configTerm: 50
          },
          {
            _id: 2,
            name: 'dr:7253',
            health: 1,
            state: 2,
            stateStr: 'SECONDARY',
            uptime: 18334,
            optime: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") },
            optimeDate: ISODate("2023-08-31T08:26:28.000Z"),
            lastAppliedWallTime: ISODate("2023-08-31T08:26:28.836Z"),
            lastDurableWallTime: ISODate("2023-08-31T08:26:28.836Z"),
            syncSourceHost: 'p2:7253',
            syncSourceId: 1,
            infoMessage: '',
            configVersion: 19,
            configTerm: 50,
            self: true,
            lastHeartbeatMessage: ''
          },
      

      mongo7253 [direct: secondary] test> rs.printSecondaryReplicationInfo()
      source: p1:7253

      {   syncedTo: 'Mon Aug 28 2023 07:15:07 GMT+0000 (Coordinated Universal Time)',   replLag: '-263523 secs (-73.2 hrs) behind the primary ' }


      source: dr:7253

      {   syncedTo: 'Thu Aug 31 2023 08:27:10 GMT+0000 (Coordinated Universal Time)',   replLag: '0 secs (0 hrs) behind the primary ' }

      mongo7253 [direct: secondary] test>

      mongo7253 [direct: secondary] test> rs.printReplicationInfo()
      actual oplog size
      '2048 MB'

      configured oplog size
      '2048 MB'

      log length start to end
      '46229695 secs (12841.58 hrs)'

      oplog first event time
      'Mon Mar 14 2022 06:53:15 GMT+0000 (Coordinated Universal Time)'

      oplog last event time
      'Thu Aug 31 2023 08:28:10 GMT+0000 (Coordinated Universal Time)'

      now
      'Thu Aug 31 2023 08:28:12 GMT+0000 (Coordinated Universal Time)'
      mongo7253 [direct: secondary] test>

      =================

       

      when I tail the log for REPL messages ;

      On p1: keep getting these messages
      tail -f /instances/mongo_db7253/logs/mongodb.log | grep REPL

      {"t":\{"$date":"2023-08-31T08:28:58.394+00:00"}

      ,"s":"I",  "c":"REPL",     "id":21401,   "ctx":"conn26379","msg":"Scheduling heartbeat to fetch a newer config","attr":{"configTerm":50,"configVersion":19,"senderHost":"dr:7253"}}

      {"t":\{"$date":"2023-08-31T08:29:00.682+00:00"}

      ,"s":"I",  "c":"REPL",     "id":21401,   "ctx":"conn2169","msg":"Scheduling heartbeat to fetch a newer config","attr":{"configTerm":50,"configVersion":19,"senderHost":"p2:7253"}}

      On p2 & dr: I get nothing 
      tail -f /instances/mongo_db7253/logs/mongodb.log | grep REPL

      however I get below 
      tail -f /instances/mongo_db7253/logs/mongodb.log | grep FlowControlRefresher

      {"t":\{"$date":"2023-08-31T08:31:39.550+00:00"}

      ,"s":"W",  "c":"STORAGE",  "id":22225,   "ctx":"FlowControlRefresher","msg":"Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries."}

      ===================

      question number :

      why is my p1 having negative lag number (73 hours)?
      whether P1 primary or p2 ?
      why p1 is primary because its ahead of other 2 nodes as per rs.printSecondaryReplicationInfo() on p2 and dr
      why p2 is primary is because when connected locally using mongosh it shows as primary.
      why p1 says - no primary available at the moment when ran rs.printSecondaryReplicationInfo() from p1
      why p2 and dr state show as 'not reachable/healthy' when ran rs.status() on P1
      but they show p1, DR as secondary and p2 as primary when ran  rs.status() on p2 and DR

      why p2 and DR show lastHeartbeatMessage: "Couldn't get a connection within the time limit" for when ran rs,satus() on p1 and shows healthy on p2 and dr.

      Is p1 a primary as its trying to replicate to p2 and Dr but not happening and p2 is false primary ?
      what's the bug with FlowControl to display message like
      "Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries."

      how to fix this replication issue ?
      my mongodb server version is 5.0.5 on all 3 nodes.

            Assignee:
            rhea.thorne@mongodb.com Alison Rhea Thorne
            Reporter:
            bharath_achar@outlook.com Bharath Kumar CM
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: