[SERVER-43778] db_repl_coordinator_test relies on specific PseudoRandom sequence Created: 03/Oct/19 Updated: 29/Oct/23 Resolved: 08/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | |||||||||
| Sprint: | Dev Tools 2019-10-07, Dev Tools 2019-10-21 | ||||||||
| Participants: | |||||||||
| Description |
|
An upgrade to PseudoRandom was reverted due to a test relying on the specific bits output by PseudoRandom(0). This is not a good situation ( Tests must not hardcode this sort of thing. If we do, we can never make improvements to the generators without updating all such tests. Regarding db/repl/replication_coordinator_impl_elect_v1_test.cpp: CatchupTakeoverCallbackCanceledIfElectionTimeoutRuns These only seem to work reliably when fed a (now legacy) PseudoRandom The ReplCoordinatorImpl takes a seed in its constructor. From this seed it makes a PseudoRandom which it uses to generate electionTimeout intervals. This is very hit-or-miss, and a test would have to hope to find a seed that puts the RS into a desired state, and such a seed, if found, would need to be updated with every little tweak of the random number generator or the interval upperBound, etc. Tests really need to directly control the election timeout durations in order to get the RS into their desired state. So really the ctor should take a Duration generator rather than a seed. For the moment I'm going to bring the entire PseudoRandom "XorShift" implementation into the test as a generator. PS: Another way to go here would be to use a FailPoint to inject an electionTimeout result, overriding the randomly generated result. See also |
| Comments |
| Comment by Githook User [ 08/Oct/19 ] |
|
Author: {'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com', 'name': 'Billy Donahue'}Message: |
| Comment by Billy Donahue [ 03/Oct/19 ] |
|
Sorry, I don't see how it will fix the problem. |
| Comment by William Schultz (Inactive) [ 03/Oct/19 ] |
|
billy.donahue If there are certain tests that require election timeouts to be deterministic, I think the simplest solution would be to explicitly set the electionTimeoutOffsetLimitFraction to 0 for the tests that need it. This parameter controls the size of the random offset added to the base election timeout. If it is 0 then the election timeout values will be constant, since there is no random offset added. You can see an example of us doing this in another unit test here. I agree it's not great that certain tests rely on the specific values of the election timeout, but I think this is probably the easiest fix to avoid the problems you are seeing with the new random number generator. |
| Comment by Billy Donahue [ 03/Oct/19 ] |
|
judah.schvimer I'm not entirely satisfied with the workaround I've implemented in the CR, but OTOH I don't know enough about the subject matter here to design a logical sequence of electionTimeouts to inject. The workaround unblocks the changes I need to make to random.h. What do you think? |
| Comment by Billy Donahue [ 03/Oct/19 ] |