-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
29
In replsettest.js, initiateWithAnyNodeAsPrimary:
- Call replSetInitiate on one node with a one-node config
- Call getPrimary(), which initializes self._slaves
- Call replSetReconfig in a loop to add remaining nodes one at a time
- Call this.awaitSecondaryNodes(self.kDefaultTimeoutMS, self._slaves, 25 /* retryIntervalMS */);
- In awaitSecondaryNodes, call isMaster on each node in "slaves". Repeat until all slave nodes are secondaries/arbiters.
If there's an election any time after Step 3, then one of the members of self._slaves could be a primary now. However, so awaitSecondaryNodes keeps trying the same set of nodes until it times out.
Observed in replsettest_control_12_nodes.js. It's probably more common now for a machine to get overloaded, causing heartbeat timeouts and elections:
- The test starts 12 nodes, the upper limit
- The nodes are all started in parallel after
SERVER-43772 - There is more time spent in step 3 now that
SERVER-45079requires we add one member at a time
- is caused by
-
SERVER-43766 Investigate the slowest sections of ReplSetTest.initiate and remove any wasted downtime
- Closed