Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Setup: 8 shard cluster, each shard a replica set consisting of a primary, 2 secondaries, a hidden secondary (for backups) and an arbiter.

We were in the process of resyncing two nodes on one of our shards (to release disk space to the operating system), when the remaining data replicating secondaries simultaneously crashed (ran out of disk space).

One of the nodes that was in the process of resyncing appears to have finished - it was in the recovering state and had reached the same level of disk usage as other nodes in the shard. I immediately backed up the data directory of this node.

I've tried redeploying the shard using the salvaged data directory from this node, but the replica set doesn't want to elect a primary - all nodes stay in the STARTUP2 state: "initial sync need a member to be primary or secondary to do our initial sync". I can start the nodes as standalone's and access the data - I need this shard to reform a replica set so the cluster can perform again. I'm not worried about data inconsistency, as most of it is "relatively" volatile.

Seeing log lines such as this:

Sep 22 23:54:08 terra mongod.10001[1716]: Mon Sep 22 23:54:08.684 [rsSync] replSet initial sync pending
Sep 22 23:54:08 terra mongod.10001[1716]: Mon Sep 22 23:54:08.684 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync

Sep 22 23:53:40 terra mongos.27017[30808]: Mon Sep 22 23:53:40.603 [ReplicaSetMonitorWatcher] warning: No primary detected for set rs1
Sep 22 23:53:40 terra mongos.27017[30808]: Mon Sep 22 23:53:40.603 [ReplicaSetMonitorWatcher] All nodes for set rs1 are down. This has happened for 7 checks in a row. Polling will stop after 23 more failed checks

Is there any way to force the replica set to reform with the data that is available?

Assignee:: Unassigned
Reporter:: Eric Coutu
Participants:: Eric Coutu, Ramon Fernandez
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Sep 23 2014 12:43:10 AM UTC
Updated:: Sep 23 2014 02:33:16 PM UTC
Resolved:: Sep 23 2014 02:33:16 PM UTC

Details

Description

Attachments

Activity

People

Dates