[SERVER-23522] Replica set recovery doesn't happen immediately at boot with 3.2 as it did with <3.2 Created: 04/Apr/16 Updated: 07/Jun/16 Resolved: 07/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nathan Neulinger | Assignee: | Kelsey Schubert |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Set up replica set with 2 nodes and an arbiter. Kill the primary. Start it back up with a empty data directory. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Had to recover an instance today that was previously the primary in a replica set (2+arb). Followed the same procedure that I've used in past with lightly loaded instances - started with an empty data directory. In the past, this would immediately start recovering from the standby instance. With this current 3.2.4 deployment, it sat there for a bit over 3 minutes before it started recovery/rebuild process. Is that expected with 3.2 or some new tuning parameter? |
| Comments |
| Comment by Kelsey Schubert [ 07/Jun/16 ] | ||||||||||||||||||||
|
Thank you for providing the additional information. The network issue observed in the logs indicates that this behavior is a result of Thanks again, | ||||||||||||||||||||
| Comment by Nathan Neulinger [ 06/Apr/16 ] | ||||||||||||||||||||
|
logs. prior to 22:45 I tried several times to bring it back online including trying to re-apply (incorrectly as you'll see) the replica set config to 'kick it', 22:4x was the last attempt and I let it sit and it eventually came online. I'll see if I can reproduce this symptom. | ||||||||||||||||||||
| Comment by Nathan Neulinger [ 06/Apr/16 ] | ||||||||||||||||||||
|
In the process of gathering logs from the other nodes - I found some log entries from prior to the event that make it look like their may have been some issue with cluster state prior. I'll attach the logs, but this may be a red herring. | ||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 06/Apr/16 ] | ||||||||||||||||||||
|
That is not expected behavior. If you have the full log files from the other members of the replica set would you be able to compress and attach them to this ticket? | ||||||||||||||||||||
| Comment by Nathan Neulinger [ 04/Apr/16 ] | ||||||||||||||||||||
|