[SERVER-53952] Building with ninja + ASan poisons the build/install/ directory Created: 21/Jan/21  Updated: 29/Oct/23  Resolved: 02/Apr/21

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.7

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Daniel Moody
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-66384 Enable command line customization of ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Dev Platform 2021-02-08, Dev Platform 2021-02-22, Dev Platform 2021-03-08, Dev Platform 2021-03-22, Dev Platform 2021-04-05
Participants:

 Description   

I use the following two SCons invocations to generate ninja files.

python ./buildscripts/scons.py --modules= --dbg=off --opt=on --link-model=dynamic --variables-files=./etc/scons/mongodbtoolchain_stable_gcc.vars --ninja generate-ninja ICECC=icecc CCACHE=ccache
python ./buildscripts/scons.py --modules= --dbg=on --opt=on --link-model=dynamic --variables-files=./etc/scons/mongodbtoolchain_stable_clang.vars --ninja generate-ninja ICECC=icecc CCACHE=ccache --sanitize=address --allocator=system NINJA_PREFIX=asan VARIANT_DIR=asandebug

I find that I must delete the contents of the build/install/ directory when switching from an ASan build to a non-ASan build.

$ rm -rf build/install/*
$ ninja -f build.ninja -j 200 build/install/bin/wt
[21/21 (100%) 0.033s] Install build/install/bin/wt
$ ninja -f asan.ninja -j 200 build/install/bin/wt
[14/14 (100%) 0.081s] Install build/install/bin/wt
$ ninja -f build.ninja -j 200 build/install/bin/wt
[10/10 (100%) 0.055s] Install build/install/bin/wt
$ ./build/install/bin/wt
./build/install/bin/wt: symbol lookup error: /home/ubuntu/mongo/build/install/bin/../lib/libshim_allocator.so: undefined symbol: __asan_init

(Using the wt tool as an example here because it compiles quickly.)



 Comments   
Comment by Githook User [ 28/Jun/21 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-53952 fix issue when checking ninja flag for ninja DESTDIR

(cherry picked from commit db03ce4c42524ea65537d03f764100722f2f3e9e)
Branch: v4.4
https://github.com/mongodb/mongo/commit/2b177045c26f8cdd0639185d79b494b3b2f0e826

Comment by Githook User [ 28/Jun/21 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-53952 Moved ninja install dir into build variant due to ninja limitations.

(cherry picked from commit 27b41724ed822f1d938e1f90df4adccc2b7d4609)
Branch: v4.4
https://github.com/mongodb/mongo/commit/f8a4c2161a2bcfcca52bf72e64e430723710f4f7

Comment by Daniel Moody [ 09/Apr/21 ]

People from the mailing list are confirming my suspicions that reusing the ninja_log file with different ninja files is dangerous: https://groups.google.com/g/ninja-build/c/Zpi2A2bOyIo

Comment by Githook User [ 05/Apr/21 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-53952 fix issue when checking ninja flag for ninja DESTDIR
Branch: master
https://github.com/mongodb/mongo/commit/db03ce4c42524ea65537d03f764100722f2f3e9e

Comment by Githook User [ 02/Apr/21 ]

Author:

{'name': 'Daniel Moody', 'email': 'daniel.moody@mongodb.com', 'username': 'dmoody256'}

Message: SERVER-53952 Moved ninja install dir into build variant due to ninja limitations.
Branch: master
https://github.com/mongodb/mongo/commit/27b41724ed822f1d938e1f90df4adccc2b7d4609

Comment by Daniel Moody [ 26/Mar/21 ]

This actually is not caused by inherent mtimes problems, its caused because each ninja variant keeps its own logs of output mtimes. This causes the interaction of the ninja variants to play out like this:

  1. After "rm -rf build/install/*", build.ninja sees no outputs existing, so it builds them all and records the mtimes from disk to its log.
  2. asan.ninja has its own output mtime logs, and now instead of seeing no files existing it sees they exist, so asan.ninja thinks it must have built them previously, so it checks its mtime log and sees its recorded output mtimes are older then some of the inputs, those inputs being some of the outputs just rebuilt from build.ninja, hence a smaller set to rebuild and records the new outputs mtimes.
  3. build.ninja now checks its mtime logs and see that some of the inputs (smaller subset of outputs just built from asan.ninja) are newer than its recorded outputs so it rebuilds those for an even smaller subset.
  4. so on an so forth until each variant has output mtimes recorded older then all inputs, where the mtimes end up being staggered like a tcp diagram across the two build logs.

The inherent problem is that ninja checks mtimes from disk for inputs and checks mtimes from its log for outputs, assuming it is the only one building the outputs and therefore can keep track of those output mtimes. Anyways I am pretty sure ninja does not support two build logs building the same set of files. Two ways I can think of to fix this:

  1. acm suggested moving the install files into a variant directory instead of sharing a common install directory, and that completely separates the ninja builds and should work.
  2. If the variants share the same ninja logs, then ninja will correctly rebuild every time each variant is called. This means moving the build/$VARIANT/ninja directory somewhere common, like '#build/ninja'. I tested this and it worked but it was very limited testing only for the scenario described in this ticket. Overall this option seems a little dangerous, as there may be other inconsistencies not accounted for when sharing ninja logs and dep files. 
Comment by Ryan Egesdahl (Inactive) [ 02/Feb/21 ]

I think this is a problem due to a subtlety that Ninja just isn't equipped to handle. What seems to have happened is that libshim_allocator.so has a dependency on an ASAN symbol because it was compiled with ASAN. We don't (and cannot) reflect the dependency on ASAN in the build graph, so when the next Ninja build is run without ASAN, Ninja doesn't know it needs to recompile libshim_allocator.so to remove that dependency. libshim_allocator.so is also at the root of the dependency graph, so nothing you're changing is going to cause it to be rebuilt. However, nearly every binary has a dependency on libshim_allocator.so, so when the dynamic linker tries to load it, it finds a dangling symbol dependency on ASAN that isn't associated with a DT_NEEDED*.

SCons doesn't cause this problem because it knows to rebuild and reinstall libshim_allocator.so when the compiler or its arguments change. Ninja, on the other hand, just compares file modification times and isn't looking at md5sums or compiler argument hashes. It sees the file exists, is newer than build.ninja, and wasn't dependent on anything else it rebuilt, so it assumes nothing needs to be done. I would not be surprised to learn that you are sometimes getting the converse case where an ASAN executable is loading a non-ASAN libshim_allocator.so without a problem. The real issue is that you have two builds that Ninja doesn't know how to differentiate sharing the same location. When you are compiling with different compilers or compiler options with Ninja, you should be installing into different places with INSTALL_DIR or removing build/install before rebuilding because otherwise Ninja will happily pollute the install prefix if you let it.

I am not sure there is a way to prevent install prefix pollution that will be workable with Ninja. If, however, you install your ASAN and non-ASAN builds (as well as builds with different compilers) into different locations - or, alternatively, leverage ccache to manage your build artifacts and just unconditionally remove build/install every time you switch to a new compiler or build type - you should not see this problem occurring.

* This happens because the compiler statically links libasan by default, and __asan_init is a global symbol reference to an initializer. Because libshim_allocator.so doesn't reference any ASAN symbols and doesn't have its own initializer, the only way the dynamic loader can possibly resolve that reference is if the binary has it. Therefore, when libshim_allocator.so is built with ASAN, the executable loading it must also be built with ASAN, or the dynamic loader will fail to resolve the symbol reference. Interestingly, you could probably use a non-ASAN libshim_allocator.so with an ASAN executable without any problems.

I made some changes to library shims in --build-tools=next that don't put any shim libraries without symbols on the linker command line, which prevents them from being a runtime dependency. That would effectively have prevented what you're seeing. But it doesn't fix the larger install prefix pollution issue, so it's not a real solution to your problem.

Comment by Andrew Morrow (Inactive) [ 21/Jan/21 ]

I believe that the issue here is related to the fact that Ninja only uses mtime, with all of the issues that entails. I'll do some further digging to confirm that hypothesis.

Comment by Andrew Morrow (Inactive) [ 21/Jan/21 ]

This issue does not reproduce with plain SCons.

Comment by Andrew Morrow (Inactive) [ 21/Jan/21 ]

I've reproduced this on both master and v4.4, after removing icecream and ccache just to be sure they weren't a factor. The problem actually looks rather severe:

...
[493/493] cp build/asandebug/third_party/wiredtiger/wt build/install/bin/wt
 
$ ninja -f build.ninja -j 200 build/install/bin/wt -v
[1/17] cp src/mongo/embedded/stitch_support/stitch_support.h build/install/include/stitch_support/v1/stitch_support/stitch_support.h
[2/17] cp LICENSE-Community.txt build/install/share/doc/mongo_embedded/LICENSE-Community.txt
[3/17] cp src/mongo/embedded/LICENSE-Embedded.txt build/install/share/doc/mongo_embedded/LICENSE-Embedded.txt
[4/17] cp src/mongo/embedded/mongo_embedded/mongo_embedded.h build/install/include/mongo_embedded/v1/mongo_embedded/mongo_embedded.h
[5/17] cp build/opt/third_party/libshim_allocator.so build/install/lib/libshim_allocator.so
[6/17] cp build/opt/third_party/libshim_unwind.so build/install/lib/libshim_unwind.so
[7/17] cp build/opt/third_party/unwind/libunwind.so build/install/lib/libunwind.so
[8/17] cp build/opt/third_party/gperftools-2.7/libtcmalloc_minimal.so build/install/lib/libtcmalloc_minimal.so
[9/17] cp build/opt/third_party/libshim_zstd.so build/install/lib/libshim_zstd.so
[10/17] cp build/opt/third_party/wiredtiger/libwiredtiger_checksum.so build/install/lib/libwiredtiger_checksum.so
[11/17] cp build/opt/third_party/snappy-1.1.7/libsnappy.so build/install/lib/libsnappy.so
[12/17] cp build/opt/third_party/libshim_zlib.so build/install/lib/libshim_zlib.so
[13/17] cp build/opt/third_party/libshim_snappy.so build/install/lib/libshim_snappy.so
[14/17] cp build/opt/third_party/zlib-1.2.11/libzlib.so build/install/lib/libzlib.so
[15/17] cp build/opt/third_party/zstandard-1.4.3/libzstd.so build/install/lib/libzstd.so
[16/17] cp build/opt/third_party/wiredtiger/libwiredtiger.so build/install/lib/libwiredtiger.so
[17/17] cp build/opt/third_party/wiredtiger/wt build/install/bin/wt
 
$ ninja -f asan.ninja -j 200 build/install/bin/wt -v
[1/10] cp src/mongo/embedded/mongo_embedded/mongo_embedded.h build/install/include/mongo_embedded/v1/mongo_embedded/mongo_embedded.h
[2/10] cp build/asandebug/third_party/libshim_snappy.so build/install/lib/libshim_snappy.so
[3/10] cp build/asandebug/third_party/wiredtiger/libwiredtiger_checksum.so build/install/lib/libwiredtiger_checksum.so
[4/10] cp build/asandebug/third_party/libshim_zstd.so build/install/lib/libshim_zstd.so
[5/10] cp build/asandebug/third_party/libshim_zlib.so build/install/lib/libshim_zlib.so
[6/10] cp build/asandebug/third_party/zlib-1.2.11/libzlib.so build/install/lib/libzlib.so
[7/10] cp build/asandebug/third_party/snappy-1.1.7/libsnappy.so build/install/lib/libsnappy.so
[8/10] cp build/asandebug/third_party/zstandard-1.4.3/libzstd.so build/install/lib/libzstd.so
[9/10] cp build/asandebug/third_party/wiredtiger/libwiredtiger.so build/install/lib/libwiredtiger.so
[10/10] cp build/asandebug/third_party/wiredtiger/wt build/install/bin/wt
 
$ ninja -f build.ninja -j 200 build/install/bin/wt -v
[1/2] cp build/opt/third_party/wiredtiger/libwiredtiger.so build/install/lib/libwiredtiger.so
[2/2] cp build/opt/third_party/wiredtiger/wt build/install/bin/wt
 
$ ninja -f build.ninja -j 200 build/install/bin/wt -v
ninja: no work to do.
 
$ ninja -f asan.ninja -j 200 build/install/bin/wt -v
[1/1] cp build/asandebug/third_party/wiredtiger/wt build/install/bin/wt
 
$ ninja -f build.ninja -j 200 build/install/bin/wt -v
ninja: no work to do.
 
$ ninja -f asan.ninja -j 200 build/install/bin/wt -v
ninja: no work to do.

I'm going to see if I can reproduce this in plain SCons next so we can determine if this is a bug in the transitive install dependency scanner in SCons, or in how those dependencies end up expressed to Ninja.

Comment by Andrew Morrow (Inactive) [ 21/Jan/21 ]

Thanks max.hirschhorn, that definitely shouldn't be happening. We will take a look.

Generated at Thu Feb 08 05:32:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.