[SERVER-43773] ShardingTest should run the startup procedure for all of its ReplSetTest shard instances in parallel Created: 02/Oct/19  Updated: 29/Oct/23  Resolved: 27/Nov/19

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 4.3.3

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File none,enterprise-rhel-62-64-bit,2c2ead223bfcdb612a55e3fc5a0b4989a3930165,startupAndInitiate.png     PNG File none,enterprise-rhel-62-64-bit,e164e7d031a1d4a22a03df3382316f6107ab81c8,startupAndInitiate.png    
Issue Links:
Problem/Incident
Related
related to SERVER-43774 ShardingTest should initiate all of i... Closed
related to SERVER-27342 Do not block unnecessarily on connect... Closed
is related to SERVER-43776 ShardingTest should run the stop proc... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-11-18, Repl 2019-12-02
Participants:
Linked BF Score: 14

 Description   

Currently, ShardingTest calls startSet on each of its shard ReplSetTest instances serially. This means that it is not possible to start up processes from multiple shards at the same time. To make this startup process faster when there are many shards, ShardingTest can start all ReplSetTest instances at the same time, without waiting for one to complete before moving on to the next one. It can then wait for all shard startup procedures to finish. This will allow the startup of all shard ReplSetTest instances to proceed in parallel.



 Comments   
Comment by William Schultz (Inactive) [ 09/Dec/19 ]

We can also see how the profile of the "startupAndInitiate" metric improved:

Before:

After:

Note that the "num_nodes" metric only accounts for the total number of shard nodes, and does not take into account the size of the config server replica set, which defaults to 3 nodes but may use less or more nodes in some cases. I think that the performance improvement here is not particularly dramatic because initiation is actually the slowest part when setting up a ReplSetTest.

Comment by William Schultz (Inactive) [ 27/Nov/19 ]

On an idle RHEL 6.2 spawn host, we can see the following performance metrics for the ShardingTest control tests running with a single config server node after these changes:

[root@ip-10-122-0-67 mci]# /opt/mongodbtoolchain/v3/bin/python3 buildscripts/resmoke.py jstests/sharding/shardingtest_control_1_node.js  | grep "ShardingTest startup for.*took"
[js_test:shardingtest_control_1_node] 2019-11-27T19:11:28.316+0000 ShardingTest startup for all nodes took 816ms with 1 config server nodes and 1 total shard nodes.
[root@ip-10-122-0-67 mci]# /opt/mongodbtoolchain/v3/bin/python3 buildscripts/resmoke.py jstests/sharding/shardingtest_control_12_nodes.js  | grep "ShardingTest startup for.*took"
[js_test:shardingtest_control_12_nodes] 2019-11-27T19:11:41.706+0000 ShardingTest startup for all nodes took 869ms with 1 config server nodes and 12 total shard nodes.

this scale factor of (869ms/816ms)=1.064 is well within the 1.5x target.

Comment by Githook User [ 27/Nov/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-43773 Start up config server and shard replica sets in parallel in ShardingTest
Branch: master
https://github.com/mongodb/mongo/commit/2c2ead223bfcdb612a55e3fc5a0b4989a3930165

Comment by Githook User [ 26/Nov/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: Revert "SERVER-43773 Start up config server and shard replica sets in parallel in ShardingTest"

This reverts commit 72845828cdac26031d66f18ef7e7a4e108d3d178.
Branch: master
https://github.com/mongodb/mongo/commit/ec9a2f13d82f141d8aca9e3df9e9112b722f2563

Comment by Githook User [ 26/Nov/19 ]

Author:

{'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}

Message: SERVER-43773 Start up config server and shard replica sets in parallel in ShardingTest
Branch: master
https://github.com/mongodb/mongo/commit/72845828cdac26031d66f18ef7e7a4e108d3d178

Comment by Githook User [ 26/Nov/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-43773 Add log messages in ShardingTest to measure total duration of startup and initiation of shards and config server
Branch: master
https://github.com/mongodb/mongo/commit/e164e7d031a1d4a22a03df3382316f6107ab81c8

Comment by William Schultz (Inactive) [ 03/Oct/19 ]

Yes, that is the idea. It should generally be similar to Judah's POC from SERVER-27342.

Comment by Kaloian Manassiev [ 03/Oct/19 ]

Are you guys planning to make the startSet/stopSet methods asynchronous in order to allow overlapping multiple of them?

Generated at Thu Feb 08 05:04:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.