[SERVER-3251] Overflowed replica set members briefly re-initialize as SECONDARY Created: 13/Jun/11 Updated: 06/Dec/22 Resolved: 23/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Greg Studer | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | elections | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 10-11 |
||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
... this could lead to slaveOk queries hitting very stale nodes? Reproduce: toostale.js, secondary status is temporarily observed on startup. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 23/Nov/16 ] |
|
With the introduction of the maxStaleness read preference ( |
| Comment by Paul Ridgway [ 23/May/16 ] |
|
Ok, thanks for the advice! |
| Comment by Eric Milkie [ 23/May/16 ] |
|
Reconfigure the node to be "hidden" until it is caught up. This requires two manual interventions. It would be better if this were automatically handled by the driver, which is what |
| Comment by Paul Ridgway [ 21/May/16 ] |
|
Do you have advice on how users should deal with the situation after maintenance where the resync is unnaturally large? |
| Comment by Eric Milkie [ 20/May/16 ] |
|
It's a legitimate issue. |
| Comment by Paul Ridgway [ 20/May/16 ] |
|
We have had a node stay in secondary state (oplog lag of 12 hours) for the whole time. Had to use iptables to only allow replication. Do you consider that by design or a legitimate issue? |
| Comment by Kristina Chodorow (Inactive) [ 28/Sep/12 ] |
|
This is, debatably by design. What happens is: if the member doesn't have anyone to sync to (which it never does right at startup) and it's caught up to its own minvalid, it go straight into secondary state. This is exactly what we want if the node is unable to reach any other nodes in the set. However, usually it contacts the other members a second or two later and finds out that it is stale, at which point it drops back to recovering state. One possible solution would be to force members to stay in recovering for a nominal time (10 seconds?) at startup. |