[SERVER-44066] Investigate and optimize the slowest portions of ReplSetTest.start when not waiting for a connection Created: 17/Oct/19  Updated: 06/Dec/22  Resolved: 16/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: William Schultz (Inactive) Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done after SERVER-43772 Start up mongod replica set nodes in ... Closed
Assigned Teams:
Replication
Sprint: Repl 2019-11-04
Participants:

 Description   

After parallelizing the startup of nodes in ReplSetTest.startSet, early tests show that even when we don't wait for a connection to a mongod node, ReplSetTest.start can take several hundred milliseconds. As measured on one Linux workstation it often takes more than 300 milliseconds. When starting up several nodes, this sequential performance bottleneck can hurt the scalability of ReplSetTest.startSet since we need to wait several hundred milliseconds before starting up the next mongod node. This impacts the performance goal of the Faster Local Testing project, which is to make startSet performance for N nodes be no greater than 1.5x the performance for 1 node. We can see the performance impact of ReplSetTest.start speed in this simple benchmark for a ReplSetTest that does nothing but repeatedly start up and shut down replica sets of increasing sizes:

[js_test:replsettest_control_scale] 2019-10-17T12:35:37.221-0400 ReplSetTest start took 304ms for node 0
[js_test:replsettest_control_scale] 2019-10-17T12:35:37.827-0400 ReplSetTest startSet took 919ms for 1 nodes.
[js_test:replsettest_control_scale] 2019-10-17T12:35:39.625-0400 ReplSetTest start took 263ms for node 0
[js_test:replsettest_control_scale] 2019-10-17T12:35:39.938-0400 ReplSetTest start took 308ms for node 1
[js_test:replsettest_control_scale] 2019-10-17T12:35:40.543-0400 ReplSetTest startSet took 1189ms for 2 nodes.
[js_test:replsettest_control_scale] 2019-10-17T12:35:41.646-0400 ReplSetTest start took 266ms for node 0
[js_test:replsettest_control_scale] 2019-10-17T12:35:41.966-0400 ReplSetTest start took 314ms for node 1
[js_test:replsettest_control_scale] 2019-10-17T12:35:42.332-0400 ReplSetTest start took 365ms for node 2
[js_test:replsettest_control_scale] 2019-10-17T12:35:42.939-0400 ReplSetTest startSet took 1568ms for 3 nodes.
[js_test:replsettest_control_scale] 2019-10-17T12:35:44.557-0400 ReplSetTest start took 263ms for node 0
[js_test:replsettest_control_scale] 2019-10-17T12:35:44.870-0400 ReplSetTest start took 309ms for node 1
[js_test:replsettest_control_scale] 2019-10-17T12:35:45.226-0400 ReplSetTest start took 355ms for node 2
[js_test:replsettest_control_scale] 2019-10-17T12:35:45.608-0400 ReplSetTest start took 373ms for node 3
[js_test:replsettest_control_scale] 2019-10-17T12:35:46.005-0400 ReplSetTest startSet took 1720ms for 4 nodes.
[js_test:replsettest_control_scale] 2019-10-17T12:35:47.769-0400 ReplSetTest start took 240ms for node 0
[js_test:replsettest_control_scale] 2019-10-17T12:35:48.060-0400 ReplSetTest start took 285ms for node 1
[js_test:replsettest_control_scale] 2019-10-17T12:35:48.388-0400 ReplSetTest start took 328ms for node 2
[js_test:replsettest_control_scale] 2019-10-17T12:35:48.765-0400 ReplSetTest start took 372ms for node 3
[js_test:replsettest_control_scale] 2019-10-17T12:35:49.188-0400 ReplSetTest start took 415ms for node 4
[js_test:replsettest_control_scale] 2019-10-17T12:35:49.586-0400 ReplSetTest startSet took 2064ms for 5 nodes.

Even with only 5 nodes, startSet speed (2064ms) is already 2.2x the performance of 1 node (919ms), which is worse than the performance goal of 1.5x. We should investigate the slowest parts of ReplSetTest.start and see what portions can be optimized.



 Comments   
Comment by William Schultz (Inactive) [ 16/Dec/19 ]

This appears to only be a measurable issue running on my particular Linux workstation, and the performance impacts are not terribly significant. Closing this since it's impact may be negligible.

Comment by William Schultz (Inactive) [ 17/Oct/19 ]

The current plan is to finish and commit the work to parallelize node startup in ReplSetTest.startSet first, and then measure the performance (as part of the work in this ticket) to see if startSet scalability is not meeting the project goals.

Generated at Thu Feb 08 05:04:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.