Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45765

Race in ReplSetTest.initiateWithAnyNodeAsPrimary

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.3.3
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • 29

      In replsettest.js, initiateWithAnyNodeAsPrimary:

      1. Call replSetInitiate on one node with a one-node config
      2. Call getPrimary(), which initializes self._slaves
      3. Call replSetReconfig in a loop to add remaining nodes one at a time
      4. Call this.awaitSecondaryNodes(self.kDefaultTimeoutMS, self._slaves, 25 /* retryIntervalMS */);
      5. In awaitSecondaryNodes, call isMaster on each node in "slaves". Repeat until all slave nodes are secondaries/arbiters.

      If there's an election any time after Step 3, then one of the members of self._slaves could be a primary now. However, so awaitSecondaryNodes keeps trying the same set of nodes until it times out. 

      Observed in replsettest_control_12_nodes.js. It's probably more common now for a machine to get overloaded, causing heartbeat timeouts and elections:

      1. The test starts 12 nodes, the upper limit
      2. The nodes are all started in parallel after SERVER-43772
      3. There is more time spent in step 3 now that SERVER-45079 requires we add one member at a time

            Assignee:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: