Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.1
Affects Version/s: None
Component/s: Internal Code
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

https://logkeeper.mongodb.org/lobster/build/963b4308c9524506b0757603f70ff063/test/5d942c83c2ab68380ca0e7a0#bookmarks=0%2C38606&l=1

Show
https://logkeeper.mongodb.org/lobster/build/963b4308c9524506b0757603f70ff063/test/5d942c83c2ab68380ca0e7a0#bookmarks=0%2C38606&l=1
Sprint:
Dev Tools 2019-10-07, Dev Tools 2019-10-21
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

An upgrade to PseudoRandom was reverted due to a test relying on the specific bits output by PseudoRandom(0). This is not a good situation (~~SERVER-43641~~).

Tests must not hardcode this sort of thing. If we do, we can never make improvements to the generators without updating all such tests.

Regarding db/repl/replication_coordinator_impl_elect_v1_test.cpp:
3 tests from the TakeoverTest/ suite are affected:

CatchupTakeoverCallbackCanceledIfElectionTimeoutRuns
DontCallForPriorityTakeoverWhenLaggedDifferentSecond
DontCallForPriorityTakeoverWhenLaggedSameSecond

These only seem to work reliably when fed a (now legacy) PseudoRandom
initialized with a seed of 0. Otherwise the election timeouts are randomized in such a way that the test doesn't reach the desired state, and it fails.
This is extremely fragile and should be fixed asap.

The ReplCoordinatorImpl takes a seed in its constructor. From this seed it makes a PseudoRandom which it uses to generate electionTimeout intervals. This is very hit-or-miss, and a test would have to hope to find a seed that puts the RS into a desired state, and such a seed, if found, would need to be updated with every little tweak of the random number generator or the interval upperBound, etc. Tests really need to directly control the election timeout durations in order to get the RS into their desired state. So really the ctor should take a Duration generator rather than a seed.

For the moment I'm going to bring the entire PseudoRandom "XorShift" implementation into the test as a generator.

PS: Another way to go here would be to use a FailPoint to inject an electionTimeout result, overriding the randomly generated result.

See also ~~SERVER-43767~~ (related issue in another test)

has to be done before

SERVER-43641 platform/random.h causing bugs, upgrade overdue

Closed

Assignee:: Billy Donahue
Reporter:: Billy Donahue
Participants:: Billy Donahue, Githook User, Will Schultz
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 03 2019 01:11:45 AM UTC
Updated:: Oct 29 2023 10:16:32 PM UTC
Resolved:: Oct 08 2019 09:19:08 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates