[SERVER-57612] Investigate whether we can perform a fake disk write in a dry-run election Created: 10/Jun/21  Updated: 06/Dec/22  Resolved: 02/Dec/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ali Mir Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-51100 Make dry-run elections write to lastV... Blocked
Related
Assigned Teams:
Replication
Participants:

 Description   

During an election, if disk operations are failing on a voting node, the real election can never succeed because we attempt to store the last vote document on disk. However, we don't perform a disk operation during dry-run elections, so they can still succeed. We can get into a state where a node is repeatedly voting 'yes' in the dry-run election, but timing out (due to the disk operation) in the real election. This can cause the term to escalate quickly to a very high number. 

 

As suggested, we should try to investigate and see if we can perform a 'fake' disk write in the dry-run election, to ensure that if disk operations are failing, we fail the dry-run election.


Generated at Thu Feb 08 05:42:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.