Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.9, 4.0.4, 4.1.3
Affects Version/s: None
Component/s: Testing Infrastructure
Labels:
- tig-resmoke

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0, v3.6
Sprint:
TIG 2018-09-10
Linked BF Score:
19
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The replica_sets_kill_primary_jscore_passthrough tests occasionally timeout due waiting for a primary to be selected.

The tests increase the election timeout to 24 hours to have control over which node is the leader. However, this can lead to a situation where the leader has been killed and both secondaries were unable to take over due to having stale oplogs. When the server is brought back up and attempts to stepup, there is a chance it has not yet heard back heartbeats from the other nodes in the cluster and assumes they are down. This means the stepup fails and another election is not attempted causing the test to eventually timeout.

A possible solution, in the event of a failure would be to retry the stepup after some delay. This would allow the secondaries more time to respond to the heart beat request.

is related to

SERVER-35383 Increase electionTimeoutMillis for the ContinuousStepdown hook used in stepdown suites

Closed

Assignee:: Jonathan Abrahams (Inactive)
Reporter:: David Bradford (Inactive)
Participants:: David Bradford, Githook User, Jonathan Abrahams, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Aug 03 2018 09:34:50 PM UTC
Updated:: Oct 29 2023 10:29:17 PM UTC
Resolved:: Aug 28 2018 06:38:38 PM UTC
Confidence Status Last Update:: 28/Aug/18 12:55 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates