[SERVER-15353] MongoDB crash left on shard unable to recover Created: 23/Sep/14 Updated: 23/Sep/14 Resolved: 23/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Eric Coutu | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Setup: 8 shard cluster, each shard a replica set consisting of a primary, 2 secondaries, a hidden secondary (for backups) and an arbiter. We were in the process of resyncing two nodes on one of our shards (to release disk space to the operating system), when the remaining data replicating secondaries simultaneously crashed (ran out of disk space). One of the nodes that was in the process of resyncing appears to have finished - it was in the recovering state and had reached the same level of disk usage as other nodes in the shard. I immediately backed up the data directory of this node. I've tried redeploying the shard using the salvaged data directory from this node, but the replica set doesn't want to elect a primary - all nodes stay in the STARTUP2 state: "initial sync need a member to be primary or secondary to do our initial sync". I can start the nodes as standalone's and access the data - I need this shard to reform a replica set so the cluster can perform again. I'm not worried about data inconsistency, as most of it is "relatively" volatile. Seeing log lines such as this: Sep 22 23:54:08 terra mongod.10001[1716]: Mon Sep 22 23:54:08.684 [rsSync] replSet initial sync pending Sep 22 23:53:40 terra mongos.27017[30808]: Mon Sep 22 23:53:40.603 [ReplicaSetMonitorWatcher] warning: No primary detected for set rs1 Is there any way to force the replica set to reform with the data that is available? |
| Comments |
| Comment by Ramon Fernandez Marina [ 23/Sep/14 ] |
|
Hi eric.coutu@sweetiq.com, glad to hear you were able to recover your replica set. Note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server and tools. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. Regards, |
| Comment by Eric Coutu [ 23/Sep/14 ] |
|
Fixed it. For anyone stuck in startup2 limbo, make sure you wipe out local.* files on every instance connected to the replica set, start MongoDB processes up again, then recreate the replica set. |