Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-14602

[SERVER] Incomplete information about role of heartbeats in replication lag calculation

      Description

      The rs.printSecondaryReplicationInfo() function is supposed to return information how far the secondaries behind the replica.  Also db.printReplicationInfo() shows the last timestamp from the oplog, and apparently both use the same time.

      During the idle time when no new data is written, this is supposed to rely on hearbeats sent from primary to secondaries,  if I read the comment from:

      https://docs.mongodb.com/manual/reference/method/rs.printSecondaryReplicationInfo/

      correctly:

      A member may show a negative time value behind the primary when rs.printSecondaryReplicationInfo() is run. This is expected if rs.printSecondaryReplicationInfo() is run after a secondary replicates a write that follows a period of inactivity, but before the secondary receives a heartbeat from the primary with the latest optime.

      Now, although the https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.settings.heartbeatIntervalMillis is set to 2 seconds, the timestamps in the functions I mentioned above, are updated only every 10 seconds rather:

      shard01:PRIMARY> rs.status()
      (...)
       "heartbeatIntervalMillis" : NumberLong(2000),

      the below two were obtained <1s apart:

       

      shard01:PRIMARY> rs.printSecondaryReplicationInfo()
      source: localhost:3501
       syncedTo: Fri Jun 25 2021 16:57:58 GMT+0200 (CEST)
       10 secs (0 hrs) behind the primary 
      source: localhost:3503
       syncedTo: Fri Jun 25 2021 16:57:58 GMT+0200 (CEST)
       10 secs (0 hrs) behind the primary
      
      shard01:PRIMARY> rs.printSecondaryReplicationInfo()
      source: localhost:3501
       syncedTo: Fri Jun 25 2021 16:58:08 GMT+0200 (CEST)
       0 secs (0 hrs) behind the primary 
      source: localhost:3503
       syncedTo: Fri Jun 25 2021 16:58:08 GMT+0200 (CEST)
       0 secs (0 hrs) behind the primary
      

       

      Not only it makes the function often return false 10 secs behind value, but limits the lag checking to only every 10 seconds.

      Which setting controls the heartbeats frequency for these functions? Is it adjustable?

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Assignee:
            Unassigned Unassigned
            Reporter:
            przemek.malkowski@gmail.com Przemek Malkowski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              1 year, 27 weeks, 3 days ago