[SERVER-43772] Start up mongod replica set nodes in ReplSetTest.startSet in parallel Created: 02/Oct/19  Updated: 29/Oct/23  Resolved: 28/Oct/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 5db08392e3c331774a397b0d,enterprise-rhel-62-64-bit,563dc7451690efa475db5feda913098e777471da.png    
Issue Links:
Gantt Dependency
has to be done before SERVER-44066 Investigate and optimize the slowest ... Closed
Related
related to SERVER-27342 Do not block unnecessarily on connect... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-10-21, Repl 2019-11-04
Participants:

 Description   

Currently, ReplSetTest.startSet starts up the mongod process for each replica set node serially, one after the other. This process is slow primarily due to the fact that we wait until we can connect to a mongod node before starting up the next one. Instead of waiting until a connection is made to a node before moving on, we can start up all processes in parallel and then wait until connections can be made to each one.



 Comments   
Comment by Githook User [ 28/Oct/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-43772 Start up nodes in ReplSetTest.startSet in parallel

This patch allows the startup procedure of replica set nodes in ReplSetTest to proceed in parallel, by starting up the processes of all mongod nodes first before waiting to connect to each. This parallel startup behavior is now the default in ReplSetTest.
Branch: master
https://github.com/mongodb/mongo/commit/d1b2abee0e6da744d38d46392df05bdae0091f11

Comment by William Schultz (Inactive) [ 23/Oct/19 ]

The results of the control tests on my local workstation (when running with ramdisk) are a bit worse:

[js_test:replsettest_control_1_node] 2019-10-23T15:48:56.419-0400 ReplSetTest startSet took 1065ms for 1 nodes.
[js_test:replsettest_control_12_nodes] 2019-10-23T15:48:12.452-0400 ReplSetTest startSet took 3222ms for 12 nodes.

This is a scale factor of (3222/1065)=3.02, which is outside the 1.5x boundary. I am not exactly sure why there is such a discrepancy between the Evergreen machines and my local machine, but I will investigate this more as a part of SERVER-44066. The Evergreen results demonstrate that the performance goal is achievable. There may be other, environmental reasons that explain the apparent slowness of my local workstation in comparison to Evergreen AWS machines.

Comment by William Schultz (Inactive) [ 23/Oct/19 ]

After the initial changes from this ticket, we can see the performance improvements for ReplSetTest.startSet. The results from this patch build task show that the startSet histogram profile is now tightly grouped around a single duration, regardless of replica set size:

We can also observe the startSet durations in the ReplSetTest control tests. Prior to these changes, we can look at runs of the control tests from this task:

replsettest_control_1_node.js: 808ms
replsettest_control_12_nodes.js: 9893ms

This is a scale factor of (9893/808)=12.24, which is to be expected with no parallelization. After the changes, we observe the following control test durations from this task:

replsettest_control_1_node.js: 809ms
replsettest_control_12_nodes.js: 1069ms

This is a scale factor of (1069/809) = 1.32x which is well within the project goal of <= 1.5x.

Generated at Thu Feb 08 05:04:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.