During an election, if disk operations are failing on a voting node, the real election can never succeed because we attempt to store the last vote document on disk. However, we don't perform a disk operation during dry-run elections, so they can still succeed. We can get into a state where a node is repeatedly voting 'yes' in the dry-run election, but timing out (due to the disk operation) in the real election. This can cause the term to escalate quickly to a very high number.
As suggested, we should try to investigate and see if we can perform a 'fake' disk write in the dry-run election, to ensure that if disk operations are failing, we fail the dry-run election.