[SERVER-4439] Replica set members shouldn't sync from a node that is very behind Created: 06/Dec/11  Updated: 27/Oct/15  Resolved: 15/Jun/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.1.2

Type: Bug Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-4664 Adding slaveDelay on a passive second... Closed
Related
related to SERVER-5454 backups using fsync from secondaries ... Closed
related to SERVER-6106 Add tests for improved secondary sync... Closed
related to DOCS-154 Add section to replication-internals ... Closed
is related to SERVER-4750 Secondary syncs to another secondary ... Closed
Operating System: ALL
Participants:

 Description   

In initial and "normal" sync.



 Comments   
Comment by Ian Whalen (Inactive) [ 15/Jun/12 ]

Bulk of the work is done, new ticket created and linked to represent improvements to testing that must be made.

Comment by auto [ 25/May/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4439 don't sync from a node that is more than 10 minutes behind the primary

This only affects choosing a sync target; if a target eventually falls 10 minutes behind,
a different target is not then chosen.
Branch: master
https://github.com/mongodb/mongo/commit/ec0ca41bdc2a80454fb7d89d4e40d6d7479b769a

Comment by Robert DiBetta [ 06/Apr/12 ]

Hello,
Can this fix be backported into a 2.0.x release ?

thanks much !
Bob DiBetta

Comment by Geoffrey Gallaway [ 05/Apr/12 ]

I'd like this to be tunable somehow. 10 minute-stale data is ancient if you have code making business decisions (routing, stats, etc) on that data. This is especially true since the times when the replication delay is the most harmful (traffic spikes, etc) are the times when secondaries are likely to fall behind. 1 minute would be more comfortable but may cause too many reconnections and flapping.

I'd assume this will make sure there is a secondary that is more up to date and reconnect to that secondary rather than randomly bouncing around looking for a more up-to-date secondary?

Comment by Eliot Horowitz (Inactive) [ 06/Dec/11 ]

I think 10 minutes is a good starting point.

Comment by Kristina Chodorow (Inactive) [ 06/Dec/11 ]

How far behind should be allowed? 1 minute? 10? How should that be balanced with trying to clone from the nearest member?

I assume that this should be adjusted for a syncDelay node.

And I'm assuming, if there's no primary, it'll just clone from whomever.

Generated at Thu Feb 08 03:05:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.