[SERVER-3251] Overflowed replica set members briefly re-initialize as SECONDARY Created: 13/Jun/11  Updated: 06/Dec/22  Resolved: 23/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Greg Studer Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: elections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10-11


Issue Links:
Duplicate
is duplicated by SERVER-24229 Slave after restart goes to secondary... Closed
Related
is related to SERVER-7177 state transition for replica sets isn... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

... this could lead to slaveOk queries hitting very stale nodes?

Reproduce: toostale.js, secondary status is temporarily observed on startup.



 Comments   
Comment by Spencer Brody (Inactive) [ 23/Nov/16 ]

With the introduction of the maxStaleness read preference (SERVER-4936), it is now possible to make the driver avoid serving queries to very stale secondaries, preventing this from causing any issues

Comment by Paul Ridgway [ 23/May/16 ]

Ok, thanks for the advice!

Comment by Eric Milkie [ 23/May/16 ]

Reconfigure the node to be "hidden" until it is caught up. This requires two manual interventions. It would be better if this were automatically handled by the driver, which is what SERVER-12861 aims to address.

Comment by Paul Ridgway [ 21/May/16 ]

Do you have advice on how users should deal with the situation after maintenance where the resync is unnaturally large?

Comment by Eric Milkie [ 20/May/16 ]

It's a legitimate issue.

Comment by Paul Ridgway [ 20/May/16 ]

We have had a node stay in secondary state (oplog lag of 12 hours) for the whole time. Had to use iptables to only allow replication. Do you consider that by design or a legitimate issue?

Comment by Kristina Chodorow (Inactive) [ 28/Sep/12 ]

This is, debatably by design. What happens is: if the member doesn't have anyone to sync to (which it never does right at startup) and it's caught up to its own minvalid, it go straight into secondary state. This is exactly what we want if the node is unable to reach any other nodes in the set.

However, usually it contacts the other members a second or two later and finds out that it is stale, at which point it drops back to recovering state.

One possible solution would be to force members to stay in recovering for a nominal time (10 seconds?) at startup.

Generated at Thu Feb 08 03:02:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.