[SERVER-6255] RS102 should automatically attempt repair Created: 29/Jun/12  Updated: 06/Dec/22  Resolved: 14/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.5
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Jed Smith Assignee: Backlog - Replication Team
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon Linux


Assigned Teams:
Replication
Participants:

 Description   

When a replica falls back to RS102, is it safe to try to fix that automatically by resyncing the DB? Administrator intervention is in most cases a good thing, so I'm just testing the water. I'd imagine there's a worry about data corruption here that I'm overlooking.

Perhaps this should be an option? (If I hit RS102, resync from scratch?) We missed a few backup replicas because we don't really monitor them, and it would have rocked had they fixed themselves.



 Comments   
Comment by Jed Smith [ 29/Jun/12 ]

Sure, agreed on the manual load, just seeing if it's something you guys have thought about making optional. I did indeed fully resync these replicas.

Thanks for the response.

Comment by Scott Hernandez (Inactive) [ 29/Jun/12 ]

I'm sure you have already read this, since it is printed in the logs when this comes up but I'll summarize some important points below: http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member

RS102 means that replication was not keeping up for too long, or a replica was offline too long to catch up again. It generally means you need a larger oplog to account for this downtime, or replication delay.

The reason this is a manual step for recovery is because of the added load it will place on some other system since it has to do a full resync, from the primary.

This is a case where no one behavior fits everyone's needs and it is left manual for the admin to manage. One thing some people do is to wipe the data dir during a restart to initiate the full-resync, but that is something you need to script yourself.

Generated at Thu Feb 08 03:11:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.