[SERVER-70839] Spawning dynamically linked mongod processes takes ~30 secs on EVG Created: 01/Aug/22  Updated: 14/Dec/22  Resolved: 14/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Tommaso Tocci Assignee: [DO NOT ASSIGN] Backlog - Server Development Platform Team (SDP) (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-70797 Reduce mongod startup time by using l... Closed
Related
related to SERVER-50476 gdb.lookup_type on dynamically linked... Closed
related to SERVER-68475 Find solution to relocation overflow ... Closed
related to SERVER-55651 Migrate many non-shipping builders to... Closed
related to SERVER-66014 switch testing build targets to dynam... Closed
related to SERVER-70797 Reduce mongod startup time by using l... Closed
is related to SERVER-9761 Mongo executables should be built wit... Closed
Assigned Teams:
Server Development Platform
Participants:

 Description   

Motivation for Request

Reduce considerably core server EVG patch time.

Context

As part of my Skunkwork project I tried to understand why starting up a sharded cluster using ShardingTest takes so long. Surprisingly I discovered that just spawning the mongod processes can take up to ~30 secs on Evegreen.

With "spawning" I mean the time that pass from when we [call execve| we call execve] to the time we execute the first instruction in mongod_main.
For instance in this patch it took ~34 secs to spawn 1 mongod process:

[js_test:shardingtest_control_12_nodes] d22050| XOXO execve at time: Mon Jul 18 14:51:16 2022
[js_test:shardingtest_control_12_nodes] d22050| XOXO mongod_main start: Mon Jul 18 14:51:25 2022

As of today we run ~883 tests in every of the 10 sharding suites, each test take ~30 secs just to spawn the process required to setup the cluster. So only accounting for sharding tests, we are spending at least 73 AWS machine hours per patch per variant just to spawn those processes.

It seems we have a huge space for improvement here.



 Comments   
Comment by Andrew Morrow (Inactive) [ 25/Oct/22 ]

We need to do a little thinking about exactly which scenarios admit dropping z,now but I agree we should do this. I also expect that --experimental-optimization=+{vishidden,fnsi} may offer even another 30% reduction in startup times.

Comment by Tommaso Tocci [ 25/Oct/22 ]

alexander.neben@mongodb.com I already did some comparison analysis and it seems that lazy symbol resolution would reduce overall EVG patch time.

You can analyze the result by comparing:

In RHEL 8.0 variant I observed tasks run time reduction that span between 5% and 20%

Comment by Tommaso Tocci [ 24/Oct/22 ]

andrew.morrow@mongodb.com removing the -z now linking flag we have a ~30% startup time improvement, I've opened DAG-2230 to introduce the change.

Comment by Alex Neben [ 04/Aug/22 ]

This might fall more on SDP than DAG and there are a ton of implications here.
For example, recently we just "ran out of binary space" and can no longer link statically without some other tradeoffs.

Thank you for getting us this data and I hope there is something we can do about it.

Generated at Thu Feb 08 06:17:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.