Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.4.22, 3.6.14, 4.1.10, 4.0.11
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0, v3.6, v3.4
Sprint:
Repl 2019-04-08, Repl 2019-04-22
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We've observed multiple cases in the sys-perf and sys-perf-4.0 Evergreen projects where a 2-node replica set, 2-shard cluster is restarted and one of the replica set shards fails to elect a member as primary after 11 attempts spanning ~2 minutes. Both nodes in the 2-node replica set had run for election at the same time repeatedly and consistently encountered a situation where each node had already voted for itself in that term. While random jitter is added to the election timeout, it is based on a PseudoRandom that is seeded with the current time on startup. The performance infrastructure spawns mongod processes concurrently and appears to end up in situations where the time on startup and thus the seed for ReplicationCoordinatorImpl::_random is the same.

Assignee:: Siyuan Zhou
Reporter:: Max Hirschhorn
Participants:: Andy Schwerin, Githook User, Max Hirschhorn, Siyuan Zhou
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Mar 26 2019 04:16:40 AM UTC
Updated:: Oct 29 2023 10:22:37 PM UTC
Resolved:: Apr 08 2019 06:03:53 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates