[SERVER-4862] after replica set members restarted, they go into recovery spin Created: 03/Feb/12 Updated: 15/Aug/12 Resolved: 07/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chris Westin | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
64 bit Window |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
I had a 3 member replica set running on my laptop. Small database, only used for training class. Had to reboot the laptop. After rebooting, restarted the replica set members. What had been the primary recovered, became a secondary, and waited for the other members, according to rs.status(). What had been secondaries went into some kind of recovery spin, creating many 2G local.N files, until I finally killed them. Logs for the three servers from the point of restart are attached. |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 07/Mar/12 ] |
|
Looks like the secondaries got killed while they were allocating the oplog. According to Mathias, there isn't anything we should be doing other than this. If you waited for the journal to finish creating the oplog, it would have started up. |
| Comment by Chris Westin [ 03/Feb/12 ] |
|
During this time, rs.status() on the previous primary reported that it was now a secondary, and that the other two nodes were unreachable/unhealthy. |