[SERVER-8476] slaveDelay with Ghostsync Created: 08/Feb/13  Updated: 06/Dec/22  Resolved: 08/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.3.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hiroaki Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS6.2 x86_64


Issue Links:
Depends
depends on SERVER-7200 use oplog as op buffer on secondaries Closed
Related
related to SERVER-4935 Mark node Recovering when replication... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

Problem

We met a serious problem of getting STALE DATA.

This problem comes from slaveDelay and Ghost sync.

Situation

Replica set members

"members" : [
{  "_id" : 0,
   "host" : "192.168.159.133:27017",
   "priority" : 2
},{"_id" : 1,
   "host" : "192.168.159.134:27017"
},{"_id" : 2,
   "host" : "192.168.159.135:27017",
   "priority" : 0,
   "slaveDelay" : 300
}]

Problem1 : syncFrom

rs.syncFrom('192.168.159.135:27017')
{
   "syncFromRequested" : "192.168.159.135:27017",
   "warning" : "requested member is more than 10 seconds behind us",
   "prevSyncTarget" : "192.168.159.133:27017",
   "ok" : 1
}

I can see this warnings, if we set the miss settings.
But we won't get this warnings, when this replica set was dull.

rs.syncFrom('192.168.159.135:27017')
{
   "syncFromRequested" : "192.168.159.135:27017",
   "prevSyncTarget" : "192.168.159.133:27017",
   "ok" : 1
}

This problem lead to human error.
But bearable, because we can avoid it.

Problem2 : Automatic ghost sync caused by network trouble.

in 192.168.159.133
Simulate the network trouble.

iptables -A INPUT -p tcp --dport 27017 -s 192.168.159.134 -j DROP

Then 192.168.154.134 is still available !!
192.168.154.134 would change the sync target form primary(192.168.154.133) to slaveDelay secondary(192.168.154.135) and KEEP ALIVE in spite of being delayed !!

But we (mongo client) cannot realize that 192.168.154.134 is now delayed.
We think, the node should die (unreachable from client) instead of unexpected delay.
Then we (client) can read fresh data from primary.



 Comments   
Comment by Eric Milkie [ 08/Sep/16 ]

This was implemented via SERVER-7200 and SERVER-12861

Comment by Hiroaki [ 01/Mar/13 ]

Thanks, write concern can be good measure for me. I'll try it and wait proper fix.

Comment by Kristina Chodorow (Inactive) [ 20/Feb/13 ]

Maybe mongod should go into recovering state if it's syncing from an "less preferable" node (slave delayed, behind, etc.).

In the meantime, you might want to use write concern (e.g., w:2) to ensure a write has been replicated to the secondary. If getLastError returns a timeout, you can send subsequent reads to the primary. If it returns success, you know that the secondary is up-to-date.

Comment by Hiroaki [ 20/Feb/13 ]

problem #1 is OK. This is merely messaging issue.

problem #2:
I'm aware of its movement that you mentioned "best effort".
And I'm totally favor of its design of both slaveDelay and Ghost sync(chained replication ?).

> In general, ...
But cannot agree with you.

Even the application can tolerate stale reads.

Of course, I read from Primary if necessary.
But I also have to read data from secondaries as much as possible due to its performance.

For instance.

  • Read page view data from secondary.
  • Read actual data from primary before updating.

Reading 10 secs past data is different from reading 3600 secs past data unintentionally.
I want to emphasis the word of unintentionally.

I think, almost all application would difficult to continue their service in this situation.
So the operator in charge of this service have to do anything immediately.

But also, It's difficult to trap this problem immediately unless realtime watching all mongod's log.

These combination will be fatal for service !!

My suggestion
So, I want mongod node to tell the client its status.

Incidentally, I think that mongod can judge easily its status itself.

ReplSetImpl::getMemberToSyncTo() in rs_initialsync.cpp

  1. could not find valid primary. (already done)
  2. selected sync target was slaveDelay(or hidden) in second attempt.
    Then, this mongod node would be in the difficult status to provide fresh data to application.
Comment by Kristina Chodorow (Inactive) [ 19/Feb/13 ]

For problem #1, perhaps we should always warn if the member is configured to be delayed.

For problem #2, this behavior is by design. 192.168.159.134 will make a "best effort" attempt to sync from a non-delayed member, but if the only member available is delayed and ahead of 192.168.159.134, 192.168.159.134 will sync from it.

In general, do not read from secondaries unless your application can tolerate stale reads.

Comment by Hiroaki [ 18/Feb/13 ]

I think, "attempts == 0 &&" is not for the sake of REPLSET in the aspect of data consistency.

rs_initialsync.cpp:204

 
if (attempts == 0 &&
 (myConfig().slaveDelay < m->config().slaveDelay || m->config().hidden)) {
 continue; // skip this one in the first attempt
}
 

Generated at Thu Feb 08 03:17:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.