[SERVER-5217] Replica set fail-over on high volume latency Created: 06/Mar/12  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.1, 2.0.2
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Sebastian Dahlgren Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 1
Labels: replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian GNU/Linux 6


Issue Links:
Related
related to SERVER-32867 Tie liveness to the ability to replicate Backlog
Assigned Teams:
Replication
Participants:

 Description   

MongoDB's heartbeat function does not monitor the health of the disk writes / reads. So in case the underlying disks on the primary node are having problems MongoDB will not switch primary.

I would like a feature in the heartbeat function that includes health checking the read/write performance. It would probably be good if this more extensive heartbeat function is optional. See the discussion on mongodb-user maillist https://groups.google.com/forum/?fromgroups#!starred/mongodb-user/gY7r3f-yz0k.

Right now the only option for us when a node has disk problems is to stop the mongod process in order to force a change of primary node.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 06/Mar/12 ]

Sorry - I didn't mean this would increase the load, I mean a short term user load spike could flip the set unnecessarily. Or an overall increase could just cause the set to flip back and forth constantly.

Comment by Sebastian Dahlgren [ 06/Mar/12 ]

Thanks for the quick feedback Elliot.

Two thoughts:

  • This would indeed increase the load, however it could be optional (with default to off)
  • The thought about a hook for this is a cool approach, might be the way to go
Comment by Eliot Horowitz (Inactive) [ 06/Mar/12 ]

Interesting, but tricky.
Concerns:

  • increased load causes disk to get busy. failing over doesn't help, just moves load
  • jitter

This is really something ec2/ebs specific thing...

What might make more sense is a hook such that you can specify a binary to execute that determines "health".

Generated at Thu Feb 08 03:08:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.