[SERVER-60426] Spurious rebuilds of the intel decimal library when using SCons Created: 04/Oct/21  Updated: 29/Oct/23  Resolved: 23/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.1

Type: Bug Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Daniel Moody
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Dev Platform 2022-03-07, Dev Platform 2022-03-21, Dev Platform 2022-04-04
Participants:

 Description   

It seems that the fix for SERVER-55165 may in fact be causing spurious rebuilds for pure SCons builds. I suspect that the add_to_implicit is causing the dependency list to oscillate build to build.



 Comments   
Comment by Githook User [ 21/Mar/22 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-60426 stop inteldecimal script from setting dependencies in generator.
Branch: master
https://github.com/mongodb/mongo/commit/473b7ffe273c3cfde4e8ef5b4fda73d6742a6e5d

Comment by Daniel Moody [ 15/Mar/22 ]

acm yeah if I can repro it, I should be able to discover the details of exactly whats happening and form a better solution. Thanks!

Comment by Andrew Morrow (Inactive) [ 15/Mar/22 ]

daniel.moody -

This reproduces for me every time on macOS with the following SCons invocation:

$ python buildscripts/scons.py --variables-files=etc/scons/xcode_macosx.vars --variables-files=etc/scons/developer_versions.vars --dbg=on --opt=on --link-model=dynamic --implicit-cache --cache --build-fast-and-loose --visibility-support=on --install-action=hardlink --cxx-std=20 --experimental-optimization=\* --experimental-runtime-hardening=\* --debug=time '$DESTDIR/lib/libbase.dylib'

After every run, I see it pulling state from cache:

Retrieved `build/cached/third_party/IntelRDFPMathLib20U1/libintel_decimal128.dylib' from cache
Retrieved `build/cached/third_party/IntelRDFPMathLib20U1/libintel_decimal128.tbd' from cache
Retrieved `build/cached/third_party/IntelRDFPMathLib20U1/libintel_decimal128.tbd.no_uuid' from cache
Command execution time: build/cached/third_party/IntelRDFPMathLib20U1/libintel_decimal128.tbd.no_uuid: 0.400576 seconds

Now, there is a lot going on in that build, so I validated that I can see the same thing on linux with a simpler build:

$ python3 buildscripts/scons.py --variables-files= --variables-files=etc/scons/mongodbtoolchain_stable_gcc.vars --variables-files=etc/scons/developer_versions.vars --dbg=on --opt=on --link-model=dynamic '$DESTDIR/lib/libbase.so' ICECC=icecc -j300 --implicit-cache

Again, on re-running that command a second time, I see the intel library get rebuilt:

scons: done reading SConscript files.
scons: Building targets ...
Linking build/optdebug/third_party/IntelRDFPMathLib20U1/libintel_decimal128.so
scons: `build/install/lib/libbase.so' is up to date.
scons: done building targets.

But, and here is a critical insight: if I drop the --implicit-cache flag, this issue goes away:

$ python3 buildscripts/scons.py --variables-files= --variables-files=etc/scons/mongodbtoolchain_stable_gcc.vars --variables-files=etc/scons/developer_versions.vars --dbg=on --opt=on --link-model=dynamic '$DESTDIR/lib/libbase.so' ICECC=icecc -j300 --implicit-cache
...
scons: done reading SConscript files.
scons: Building targets ...
Linking build/optdebug/third_party/IntelRDFPMathLib20U1/libintel_decimal128.so
scons: `build/install/lib/libbase.so' is up to date.
scons: done building targets.

Meanwhile, if I take out the call to add_to_implicit in src/third_party/IntelRDFPMathLib20U1/SConscript, the issue goes away (after one more spurious rebuild, with subsequent builds always up to date), even if I use --implicit-cache.

I didn't realize that --implicit-cache was necessary to see this issue when I filed it. I almost always use that flag since it does significantly speed up pure SCons builds.

Maybe the interaction between --implicit-cache and the call to add_to_implicit as a necessary setup for observing these rebuilds is enough to point to a root cause. Do you want to take the ticket back to pursue that angle?

Generated at Thu Feb 08 05:49:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.