[SERVER-17598] Election happens frequently on primary shard Created: 16/Mar/15  Updated: 08/Apr/15  Resolved: 08/Apr/15

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 2.6.6
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Kazuo Yagi Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

To whom it may concern,

I have a problem that election happens frequently on primary shard in sharded-cluster. These elections seem to happen in accordance with the failures of heartbeats between primary and secondary although there doesn't seem to be any network problems. Neither the network traffic nor the amount of connections is far from limits(traffic: 5Mbps to 15Mbps, connections: 30 to 60).

Heartbeat has been continuously failing. As the probable result of it, election happens as well from 10 to 20 times a day on the primary shard, while once or twice at the most on the other normal shards.

Is there any good way to stabilize the primary shard status? Our application fails every time the election happens and then it has to wait until a new primary is elected.

This problem significantly affects our application performance. I would appreciate if you could give me any help to solve it.

Best Regards,
Kazuo Yagi <ka_yagi@fancs.com>

  • Our sharded cluster has 10 shards each of which consists of 1 primary, 1 secondary and 1 arbiter.Therefore, the amount of hosts is 30.
    Each member's priority is 1(= default value).The mongodb version is 2.6.6 for ubuntu 14.04, amd64 architecture. All hosts are running on AWS.


 Comments   
Comment by Ramon Fernandez Marina [ 01/Apr/15 ]

ka_yagi@fancs.com, we haven't heard back from you for some time. If this is still an issue for you can you please provide the logs requested by Sam so we can investigate?

Thanks,
Ramón.

Comment by Sam Kleinman (Inactive) [ 16/Mar/15 ]

Without more information about your deployment, it's difficult to asses the root cause of this issue. Unnecessary failover events occur most commonly when there's sort of network configuration error.

If you can provide logs from all members of the affected replica set, we can look over them and see if there's an obvious root clause.

Generated at Thu Feb 08 03:45:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.