Problem
In BF-27059 we would see a bug where a retryable write fails on a fresh initial sync node due to trying to retry operations not in the nodes oplog. This is a very consistent build failure that has been ongoing for over a month with 33% failure rate.
Solution & Acceptance Criteria
The current ContinuousInitialSync hook has a random chance to either immediately promote the initial sync node after sync or to wait before promotion. The bug only occurs on immediate promotion so the solution should be to remove the random chance.
Impact
This change does reduce the total number of initial syncs done during the suite (from ~100 to ~50 on average). This should still be sufficient to test initial sync and can be further improved by adjusting the wait time before promotion in the future.
|