[SERVER-24229] Slave after restart goes to secondary state even if is out of sync. Created: 20/May/16 Updated: 20/May/16 Resolved: 20/May/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.9, 3.0.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bartosz Debski | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Stop a replica member for some time eg. 1h assuming you got enough of oplog for the member to caught up. start replica member and check logs, along with mongo cli. |
||||||||
| Participants: | |||||||||
| Description |
|
Hi, I have found that when a member joins into replicaset after period of inactivity , eg. maintenance downtime on a server that hosts database, it goes very guickly from RECOVERY to SECONDARY state even if data are not fully synced. That resulted in inconsistent results from reads. We had to restrict any non replication traffic with iptables until member have caught up with replication. eg. logs on startup of a member
and replication info:
We have the same behaviour on 3.0.9 and 3.0.12 , tested on two different databases. |
| Comments |
| Comment by Eric Milkie [ 20/May/16 ] |
|
Right now, we're actively working on something for 3.4 to support this that involves driver changes. It may slip past 3.4, however. |
| Comment by Bartosz Debski [ 20/May/16 ] |
|
Fair enough. So will we have any eta on |
| Comment by Eric Milkie [ 20/May/16 ] |
|
It's a duplicate of |
| Comment by Bartosz Debski [ 20/May/16 ] |
|
I would not agree that's a duplicate of |
| Comment by Eric Milkie [ 20/May/16 ] |
|
While this is indeed the current behavior, it is less than ideal. This ticket is a duplicate of
|
| Comment by Paul Ridgway [ 20/May/16 ] |
|
Additionally we'd like to know, is this expected/desired behaviour, we seem to recall it waiting in STARTUP2 or RECOVERY before, while the oplog caught up, then started serving reads? Perhaps this was from a complete re-sync start but we'd like to clarify? Thanks |