[SERVER-9728] Is there any specific way (calculation) to check replication lag. Created: 20/May/13  Updated: 10/Dec/14  Resolved: 20/May/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Manish Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi,

I have found in MongoDB documentation that "An ISODate formatted date string that reflects the last entry from the oplog that this member applied. If this differs significantly from lastHeartbeat this member is either experiencing “replication lag” or there have not been any new operations since the last update"

My question is,Is there any particular calculation or some thing else through which I can measure replication lag.

Below is the statics on basis of that I want to check any Replication Lag.

"members" : [
{

"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1137376,
"optime" :

{ "t" : 1369035838, "i" : 2 }

,
"optimeDate" : ISODate("2013-05-20T07:43:58Z"),
"lastHeartbeat" : ISODate("2013-05-20T11:41:24Z"),
"lastHeartbeatRecv" : ISODate("2013-05-20T11:41:25Z"),
"pingMs" : 0,

},
{
"_id" : 1,
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1139161,
"optime" :

{ "t" : 1369035838, "i" : 2 }

,
"optimeDate" : ISODate("2013-05-20T07:43:58Z"),
"self" : true
},
{
"_id" : 2,
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1137376,
"optime" :

{ "t" : 1369035838, "i" : 2 }

,
"optimeDate" : ISODate("2013-05-20T07:43:58Z"),
"lastHeartbeat" : ISODate("2013-05-20T11:41:24Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,

}
],

Thanks
MANISH



 Comments   
Comment by Daniel Pasette (Inactive) [ 02/Sep/13 ]

printSlaveReplicationInfo() is comparing the optimes of the secondary to the primary, not current time. The output of the command is confusing, which is why it was changed to be more clear with this ticket: SERVER-7800.

Output now looks like this:

name:SECONDARY> db.printSlaveReplicationInfo()
source: Daniels-MacBook-Air-2.local:31001
	syncedTo: Mon Sep 02 2013 09:28:47 GMT-0400 (EDT)
	0 secs (0 hrs) behind the primary

Comment by Sumeet Sharma [ 30/Aug/13 ]

The printSlaveReplicationInfo() represents how far behind the secondaries are from the *currentTime*.

This is slightly misleading coz if there is no operation, rs.status().members.optimeDate doesnot update. Causing the optimeDate - now() to reflect a large number even though slaves are not out of date from primary. Wrote a function to compare the primary(as Eliot mentioned):

function getReplicationLag() {
primary=[];
secondary={};
rs.status().members.map(function(a){if(a.state==1)

{primary.push(a.optimeDate);}

else {secondary[a.name]=a.optimeDate;}} );
for (var i in secondary)

{ print(i +" is "+(secondary[i]-primary[0])/1000+" seconds behind primary"); }

}
Eliot, please let me know if we can either modify the printSlaveReplicationInfo ?

Comment by Eliot Horowitz (Inactive) [ 20/May/13 ]

You just need to compare the optime on a secondary to the primary.

So if you look at the optimes above, you'll see that the are the same for all nodes, so there is no lag.

Generated at Thu Feb 08 03:21:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.