Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43778

db_repl_coordinator_test relies on specific PseudoRandom sequence

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.1
    • Component/s: Internal Code
    • Labels:
      None

      Description

      An upgrade to PseudoRandom was reverted due to a test relying on the specific bits output by PseudoRandom(0). This is not a good situation (SERVER-43641).

      Tests must not hardcode this sort of thing. If we do, we can never make improvements to the generators without updating all such tests.

      Regarding db/repl/replication_coordinator_impl_elect_v1_test.cpp:
      3 tests from the TakeoverTest/ suite are affected:

      CatchupTakeoverCallbackCanceledIfElectionTimeoutRuns
      DontCallForPriorityTakeoverWhenLaggedDifferentSecond
      DontCallForPriorityTakeoverWhenLaggedSameSecond

      These only seem to work reliably when fed a (now legacy) PseudoRandom
      initialized with a seed of 0. Otherwise the election timeouts are randomized in such a way that the test doesn't reach the desired state, and it fails.
      This is extremely fragile and should be fixed asap.

      The ReplCoordinatorImpl takes a seed in its constructor. From this seed it makes a PseudoRandom which it uses to generate electionTimeout intervals. This is very hit-or-miss, and a test would have to hope to find a seed that puts the RS into a desired state, and such a seed, if found, would need to be updated with every little tweak of the random number generator or the interval upperBound, etc. Tests really need to directly control the election timeout durations in order to get the RS into their desired state. So really the ctor should take a Duration generator rather than a seed.

      For the moment I'm going to bring the entire PseudoRandom "XorShift" implementation into the test as a generator.

      PS: Another way to go here would be to use a FailPoint to inject an electionTimeout result, overriding the randomly generated result.

      See also SERVER-43767 (related issue in another test)

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: