[SERVER-72273] Allow for dynamic loose build mode to prevent relinking Created: 20/Dec/22 Updated: 02/Feb/24 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matthew Saltz (Inactive) | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Build
|
| Participants: |
| Description |
|
When I make a modification to src/mongo/db/db_raii.cpp, and recompile the server, it generates 245 "compile tasks" and takes about 2 minutes. It seems like a change to one cpp file should not require all this extra work, especially since I'm using dynamic linking. Here's a sample of the tasks generated:
Here's the function I used to build debug.ninja:
For a full repro this should do it:
|
| Comments |
| Comment by Matt Broadstone [ 19/Jan/24 ] | ||||||||||
|
Can we please re-consider this feature, or something that addresses the original ask. It sounds like we have a few options which might be:
I'm making a small change to db_s_config_server_test and the link time is ~65s right now, which is just long enough for me to get distracted and read some other Jira ticket or slack while waiting to recompile my unit tests. Working with unit tests should be a joyful, fast iterative experience. What can we do to move in that direction?
| ||||||||||
| Comment by Daniel Moody [ 20/Dec/22 ] | ||||||||||
|
I think a good solution here is a command line option which will allow very loose linking, like `--link-model=dynamic-loose` where the libraries don't get relinked because dependent shared objects are changed. This will skip any build time checks for symbol resolution and you will see any undefined refs at runtime. | ||||||||||
| Comment by Daniel Moody [ 20/Dec/22 ] | ||||||||||
I don't know where you get this figure from, but it just depends on the levels of dependency in the particular situation. Looking at the graph paths, I printed the longest length dependency path using networkx between mongod and libshard_role (where db_raii is linked in), and its 30 levels of dependency, this mostly like is the bottle neck. Those links must be done consecutively in 30 separate links. Actually in a static build there the .a libraries are not dependent on each other and more concurrency can be done just in the context of just libraries (Note not counting Programs). The db_raii.cpp just happens to be somewhat deep in dependencies. If you chose a different file it could be worse or better depending on its location in the dependency tree. For example touching a source file in the shell:
The premise of dynamic link model is not to improve incremental rebuilds. It was originally intended to reduce memory pressure and therefore allow greater linking concurrency. In the context of a full build, we link 100s of binaries, and in a static link build each can take somewhere between 5GB to 10GB, this can easily OOM a system. Dynamic link model considerably reduces the memory, and is generally faster than the same static link (due to being able to skip out on transitive symbol refs), but in some cases can require more links depending on the situation.
| ||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 20/Dec/22 ] | ||||||||||
|
I ran an experiment to just do static linking. It's taking ~13 seconds. Is there a reason to use dynamic linking if it's not effective in the incremental case?
Edit
The dynamic linking experiment I ran did both of touching a file (the -j1) and editing the file (-j 300). For the better apples to apples:
| ||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 20/Dec/22 ] | ||||||||||
|
I've added my log files from ninja -v when recompiling after touching only db_raii.cpp | ||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 20/Dec/22 ] | ||||||||||
How in parallel are they? The numbers aren't adding up. I'm able to reproduce matthew.saltz@mongodb.com's observation. When I run with j1 as a baseline I get ~3 minutes:
When I run with j300 (certainly larger than whatever size pool ninja is configured to use) that only brings the runtime down to ~2 minutes. I'd expect, for example, at least 10 parts being processed in parallel. Which would bring the recompile down from 180 seconds to ~20.
I agree with the observation that 10s of milliseconds is typical. But I also see some 1+ second blips (from the -j1 logs):
To be clear of what (my) asks are here:
| ||||||||||
| Comment by Daniel Moody [ 20/Dec/22 ] | ||||||||||
The link steps happen in parallel. The install steps (rm + ln) also are in parallel and are relatively fast (10s of milliseconds). | ||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 20/Dec/22 ] | ||||||||||
|
Is there an option that lets the rm + ln/install steps happen in parallel? Also curious if there's a correctness concern there. | ||||||||||
| Comment by Daniel Moody [ 20/Dec/22 ] | ||||||||||
|
We use build time symbol resolution and explicitly make these dependencies and relink all the libraries to recheck all symbols still correctly resolve so there are no issues at runtime. So you change a library which other libraries linked to, and everything is rechecked to make sure when the binary runs all symbols refs are satisfied.
That said, the overuse of PUBLIC libdeps means that there are unnecessary dependencies and links. And we could make a command line option feature so that you can build without z,defs (build time symbol resolution) and we can remove the dependencies so nothing is relinked besides the library you changed the cpp in. |