[DOCS-14602] [SERVER] Incomplete information about role of heartbeats in replication lag calculation Created: 25/Jun/21  Updated: 30/Oct/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: 4.4
Fix Version/s: Server_Docs_20231030

Type: Improvement Priority: Major - P3
Reporter: Przemek Malkowski Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 1 year, 14 weeks, 2 days ago
Epic Link: DOCSP-11702

 Description   

Description

The rs.printSecondaryReplicationInfo() function is supposed to return information how far the secondaries behind the replica.  Also db.printReplicationInfo() shows the last timestamp from the oplog, and apparently both use the same time.

During the idle time when no new data is written, this is supposed to rely on hearbeats sent from primary to secondaries,  if I read the comment from:

https://docs.mongodb.com/manual/reference/method/rs.printSecondaryReplicationInfo/

correctly:

A member may show a negative time value behind the primary when rs.printSecondaryReplicationInfo() is run. This is expected if rs.printSecondaryReplicationInfo() is run after a secondary replicates a write that follows a period of inactivity, but before the secondary receives a heartbeat from the primary with the latest optime.

Now, although the https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.settings.heartbeatIntervalMillis is set to 2 seconds, the timestamps in the functions I mentioned above, are updated only every 10 seconds rather:

shard01:PRIMARY> rs.status()
(...)
 "heartbeatIntervalMillis" : NumberLong(2000),

the below two were obtained <1s apart:

 

shard01:PRIMARY> rs.printSecondaryReplicationInfo()
source: localhost:3501
 syncedTo: Fri Jun 25 2021 16:57:58 GMT+0200 (CEST)
 10 secs (0 hrs) behind the primary 
source: localhost:3503
 syncedTo: Fri Jun 25 2021 16:57:58 GMT+0200 (CEST)
 10 secs (0 hrs) behind the primary
 
shard01:PRIMARY> rs.printSecondaryReplicationInfo()
source: localhost:3501
 syncedTo: Fri Jun 25 2021 16:58:08 GMT+0200 (CEST)
 0 secs (0 hrs) behind the primary 
source: localhost:3503
 syncedTo: Fri Jun 25 2021 16:58:08 GMT+0200 (CEST)
 0 secs (0 hrs) behind the primary

 

Not only it makes the function often return false 10 secs behind value, but limits the lag checking to only every 10 seconds.

Which setting controls the heartbeats frequency for these functions? Is it adjustable?

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Education Bot [ 31/Oct/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Generated at Thu Feb 08 08:10:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.