[SERVER-62313] Compiling pre-processed sourcecode with CCache loses path information Created: 29/Dec/21  Updated: 29/Oct/23  Resolved: 29/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Spencer Jackson Assignee: Andrew Morrow (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-62227 Perf annotate tool on Linux doesn't s... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Dev Platform 2022-01-24, Dev Platform 2022-03-21, Dev Platform 2022-04-04
Participants:

 Description   

Compiling with scons+ccache+UBSAN results in binaries which do not appear to respect the contents of etc/ubsan.denylist.

python ./buildscripts/scons.py --variables-files=etc/scons/developer_versions.vars --variables-files=etc/scons/mongodbtoolchain_v4_clang.vars --dbg=on --opt=on --allocator=system --sanitize=address,undefined  --link-model=dynamic --linker=lld ICECC= CCACHE=/opt/mongodbtoolchain/v4/bin/ccache MONGO_GIT_HASH="unknown" VERBOSE=1 +message_compressor_manager_test
 
...
 
t":"FullNormalCompression","rep":1,"reps":1}}
{"t":{"$date":"2021-12-29T23:11:41.301Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"thread1","msg":"Running","attr":{"test":"SERVER_28008","rep":1,"reps":1}}
src/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.c:1326:46: runtime error: applying non-zero offset 1 to null pointer
    #0 0x7f5ccee84436 in ZSTD_reset_matchState /home/ubuntu/mongo/src/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.c:1326:46
    #1 0x7f5ccee82b52 in ZSTD_resetCCtx_internal /home/ubuntu/mongo/src/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.c:1527:9
    #2 0x7f5ccee79877 in ZSTD_compressBegin_internal /home/ubuntu/mongo/src/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.c:2946:5

Checking the ccache diagnostics we can see entries like:

[2021-12-29T22:22:41.356333 8593 ] Running real compiler
[2021-12-29T22:22:41.356383 8593 ] Executing /opt/mongodbtoolchain/v4/bin/clang -std=c11 -fasynchronous-unwind-tables -ggdb -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -fno-omit-frame-pointer -fno-strict-aliasing -O2 -march=sandybridge -mtune=generic -mprefer-vector-width=128 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-compare -Wno-tautological-constant-compare -Wno-tautological-unsigned-zero-compare -Wno-tautological-unsigned-enum-zero-compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-unused-lambda-capture -fstack-protector-strong -fsanitize=address,undefined -fno-omit-frame-pointer -fsanitize-blacklist=etc/ubsan.denylist -fno-sanitize-recover -fno-sanitize=vptr -Qunused-arguments -fPIC -Werror -DPCRE_STATIC -DMONGO_USE_VISIBILITY -D_XOPEN_SOURCE=700 -D_GNU_SOURCE -DADDRESS_SANITIZER -DUNDEFINED_BEHAVIOR_SANITIZER -DBOOST_THREAD_VERSION=5 -DBOOST_THREAD_USES_DATETIME -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS -DBOOST_ENABLE_ASSERT_DEBUG_HANDLER -DBOOST_LOG_NO_SHORTHAND_NAMES -DBOOST_LOG_USE_NATIVE_SYSLOG -DBOOST_LOG_WITHOUT_THREAD_ATTR -DABSL_FORCE_ALIGNED_ACCESS -DBOOST_LOG_DYN_LINK -Ibuild/optdebug/third_party/zstandard-1.4.4/zstd/lib/common -Isrc/third_party/zstandard-1.4.4/zstd/lib/common -Isrc/third_party/zstandard-1.4.4/zstd/lib -x c -c -fcolor-diagnostics -o build/optdebug/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.dyn.o /run/user/1000/ccache-tmp/tmp.cpp_stdout.G03F6m.i

Note that this compiler invocation created build/optdebug/third_party/zstandard-1.4.4/zstd/lib/compress/zstd_compress.dyn.o and consumed /run/user/1000/ccache-tmp/tmp.cpp_stdout.G03F6m.i. Because clang is not running against the true source tree, I believe clang cannot apply the following relative path based suppression from etc/ubsan.denylist:

# TODO(SERVER-49230) Remove this after upgrading to 1.5 per
# https://github.com/facebook/zstd/issues/2110#issuecomment-996146316
[pointer-overflow]
src:src/third_party/zstandard-*/*

I believe CCache chooses to run this way because we've disabled CCACHE_CPP2. According to CCache's docs:

If true, ccache will first run the preprocessor to preprocess the source code (see The preprocessor mode) and then on a cache miss run the compiler on the source code to get hold of the object file.

When that logic is excised, the following invokations results in functioning test binaries:

python ./buildscripts/scons.py --variables-files=etc/scons/developer_versions.vars --variables-files=etc/scons/mongodbtoolchain_v4_clang.vars --dbg=on --opt=on --allocator=system --sanitize=address,undefined  --link-model=dynamic --linker=lld ICECC= CCACHE=/opt/mongodbtoolchain/v4/bin/ccache MONGO_GIT_HASH="unknown" VERBOSE=1 +message_compressor_manager_test
...
{"t":{"$date":"2021-12-29T22:58:18.571Z"},"s":"I",  "c":"TEST",     "id":4680101, "ctx":"thread1","msg":"Result","attr":{"suite":{"name":"ZstdMessageCompressor","tests":2,"fails":0,"asserts":0,"time":{"durationMillis":0},"failures":[]}}}
{"t":{"$date":"2021-12-29T22:58:18.571Z"},"s":"I",  "c":"TEST",     "id":23065,   "ctx":"thread1","msg":"Totals","attr":{"totals":{"name":"TOTALS","tests":17,"fails":0,"asserts":0,"time":{"durationMillis":1},"failures":[]}}}
{"t":{"$date":"2021-12-29T22:58:18.571Z"},"s":"I",  "c":"TEST",     "id":23069,   "ctx":"thread1","msg":"SUCCESS - All tests in all suites passed"}
scons: done building targets.



 Comments   
Comment by Githook User [ 28/Mar/22 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-62313 Disable ccache cpp2 when sanitizer denylists are in play
Branch: master
https://github.com/mongodb/mongo/commit/fcf8de14cb92fb4ff5874308104ca942ea11b655

Comment by Daniel Moody [ 24/Mar/22 ]

We could also give CCACHE_EXTRAFILES more structure, so you can tag a file as being "pathy" and therefore needing to suppress NOCPP2.

This seems a little over engineered, how many cases are we expecting to support where we need to disable NOCPP2? I like the generator idea and -o idea though.

One caveat is that ccache honors environment variables even over explicit command line arguments (an interesting choice). But we can probably deal with that by just refusing to load the ccache tool if we see anything in env[ENV] that starts with CCACHE_

Ninja would not be able to have that level of control. Its a bit of an edge case I am referring to, so overall I agree with what you are saying. I doubt a developer would set the CCACHE_CPP2 environment variable explicitly, but switching to the -o option would open a hole for confusion.

Comment by Andrew Morrow (Inactive) [ 24/Mar/22 ]

Another advantage of the above generator driven scheme is that it would make the tool a little more robust, since CCACHE_EXTRAFILES wouldn't need to be set before the tool was loaded, and it could also be customized in a cloned Environment and the tool would still do the right thing.

Comment by Andrew Morrow (Inactive) [ 24/Mar/22 ]

daniel.moody - I was thinking about the fact that we already had CCACHE_EXTRAFILES in order to let ccache know about files to hash. I just realized though that cache also allows us to explicitly pass option values on the command line with the -o syntax. So maybe we can simplify and harden things, by moving the handling of run_second_cpp (which is what CCACHE_CPP2 vs CCACHE_NOCPP2) into a generator, which can then also consult CCACHE_EXTRAFILES while generating the action string to decide about run_second_cpp. We could also give CCACHE_EXTRAFILES more structure, so you can tag a file as being "pathy" and therefore needing to suppress NOCPP2. It'd be a little refactor, but maybe for the best? I never liked that we needed to muck around with the `ENV` anyway. One caveat is that ccache honors environment variables even over explicit command line arguments (an interesting choice). But we can probably deal with that by just refusing to load the ccache tool if we see anything in env[ENV] that starts with CCACHE_. Thoughts?

Comment by Andrew Morrow (Inactive) [ 23/Mar/22 ]

OK I think we have the shape of a plan for this one. I think I will 1) provide a customization point to suppress NOCPP2 and enable that customization point when sanitizers are in play, and 2) add guardrails disallowing icecream when sanitizers are in play, but 3) provide a --do-as-i-said flag that suppresses the guardrails in 2.

Comment by Daniel Moody [ 23/Mar/22 ]

Can you confirm that there is no correctness motivation behind CCACHE_NOCPP2=1, only performance?

Yeah just performance, CCACHE's default (CCACHE_CPP2=1, the NO is removed meaning double preprocessing) is different then what we set.  The cost is the time it takes to preprocess all cache misses, which is not insignificant, but also not unbearable.

I don't think we have the ability to constrain on the icecream version for remotes, right? What is the behavior that we require from icecream 1.3? Should we disallow sanitizer builds with icecream until we can enforce 1.3? Or should we just make the ccache related fix and hope for the best when icecream is in play

We can't control it easily or currently. We'd need to setup a dedicated cluster, which I think we intend to anyways. Heres the ticket around the behavior, which also includes the fixing icecream commit. I think disallowing them by default is probably good, but I use icecream with sanitizers and I usually end up not having issues, because I assume most of the cluster is on a newer icecream now-a-days. It might be flexible if we had a force option with a disclaimer maybe? I usually just delete third party build folder if I see issues then rebuild without icecream so most of my sanitizer build is built with icecream.

 

 

Comment by Andrew Morrow (Inactive) [ 23/Mar/22 ]

daniel.moody

But the command line in the reproducer shows GCC, ccache, and no icecream

This is a sanitizer and is using clang I thought? CCACHE_NOCPP2=1 is in play

Yes, you are right, I got tangled up in the permutations.

could we fix this by arranging things so we don't use the CCACHE_NOCPP2=1 option when we are using the sanitizer denylists?

I think this is a pretty good solution, I know you referenced the wiki link, but those were issues were around GCC, and this case we are using clang. So I think there is an entirely new issue here, related to denylist and ccache using preprocessed files out of the source tree.

I think that as long as we are confident that opting into CCACHE_NOCPP2=1 was only ever an optimization, and not something that we needed to do for correctness (and I think your wiki table supports that), then we can go ahead and build the mechanism that lets the sanitizer builds ask that the ccache tool to not enable the NOCPP2 optimization. I don't think that would be too hard. Can you confirm that there is no correctness motivation behind CCACHE_NOCPP2=1, only performance?

Do we know if we also have intrinsic difficulties with honoring the denylist files based on filepath due to icecream, for both the with and without ccache cases?

I think we have issues with both, but both have a fix. For CCACHE we need to not use the CCACHE_NOCPP2=1 in the sanitizer case. For ICECREAM we need to make sure the local and all remotes are using >1.3.

I don't think we have the ability to constrain on the icecream version for remotes, right? What is the behavior that we require from icecream 1.3? Should we disallow sanitizer builds with icecream until we can enforce 1.3? Or should we just make the ccache related fix and hope for the best when icecream is in play? Most engineers aren't doing local sanitizer builds that often, but it does hurt to take away icecream.

Comment by Ryan Egesdahl (Inactive) [ 23/Mar/22 ]

daniel.moody Your last concern should be addressed by BUILD-14609. Until that point, I don't think we can be assured that any of the Icecream build hosts are >1.3.

Comment by Daniel Moody [ 23/Mar/22 ]

It possible CCACHE has fixed this behavior in a new version, I have not looked at that possibility.

Comment by Daniel Moody [ 23/Mar/22 ]

But the command line in the reproducer shows GCC, ccache, and no icecream

This is a sanitizer and is using clang I thought? CCACHE_NOCPP2=1 is in play

could we fix this by arranging things so we don't use the CCACHE_NOCPP2=1 option when we are using the sanitizer denylists?

I think this is a pretty good solution, I know you referenced the wiki link, but those were issues were around GCC, and this case we are using clang. So I think there is an entirely new issue here, related to denylist and ccache using preprocessed files out of the source tree.

Do we know if we also have intrinsic difficulties with honoring the denylist files based on filepath due to icecream, for both the with and without ccache cases?

I think we have issues with both, but both have a fix. CCACHE we need to not use the CCACHE_NOCPP2=1 in the sanitizer case. ICECREAM we need to make sure the local and all remotes are using >1.3.

Comment by Ryan Egesdahl (Inactive) [ 23/Mar/22 ]

acm daniel.moody Keep in mind, one of the problems we were having when implementing this behavior is that we were dealing with mutual dependencies between specific versions Icecream and ccache. I don't think our assumptions need hold in either case anymore, especially with the v4 toolchain, so it's possible that we just need to look at this anew.

Comment by Andrew Morrow (Inactive) [ 23/Mar/22 ]

daniel.moody -

A few follow-up questions:

  • The assertion in this ticket is that it is the presence of CCACHE_NOCPP2=1 that is preventing the denylist entries from having effect, because in that mode the filenames get temp-ified. But the command line in the reproducer shows GCC, ccache, and no icecream. According to your first bullet, we shouldn't actually be using CCACHE_NOCPP2=1 in that setup, so I'm not sure how to reconcile that.
  • If, despite my confusion, it really is CCACHE_NOCPP2=1 that is the issue, and if it is only an optimization, could we fix this by arranging things so we don't use the CCACHE_NOCPP2=1 option when we are using the sanitizer denylists? We could add a new customization variable for our ccache tool, like CCACHE_REQUIRE_TRUE_FILENAMES or something, and then use that to decide about any ccache options that may interfere with filename matching. We would get a less optimized ccache setup when the sanitizers are in play, but it would at least do the right thing.
  • Do we know if we also have intrinsic difficulties with honoring the denylist files based on filepath due to icecream, for both the with and without ccache cases? If disabling CCACHE_NOCPP2 per the above suggestion would fix things I'm interested in trying it out, but less so if we are just going to find that we are immediately foiled by icecream.
Comment by Daniel Moody [ 23/Mar/22 ]

acm for further explanation there are only two suboptimal conditions we are currently supporting due to the bugs mentioned in the wiki link.

  1. if using gcc and ccache without icecream, we can't use the CCACHE_NOCPP2=1 optimization due to gcc bug (this is fixed in v4 gcc)
  2. if using gcc and ccache with icecream, we can't use -fdirectives-only due to icecream bug. -fdirectives-only with ccache gives greater chance of cache hits so is an optimization.
Comment by Daniel Moody [ 23/Mar/22 ]

Yeah, but CCACHE_NOCPP2=1 is an optimization for ccache so the compiler doesn't need to re-preprocess the source files in a cache miss scenario. For example ccache first preprocesses the source as an input in generation of a hash id to check the cache with. If there was a miss, well you already did some work by preprocessing so instead of invoking the normal compiler command, ccache will give the preprocessed source to the compiler. I'm not sure how much time is actually saved.

Comment by Ryan Egesdahl (Inactive) [ 23/Mar/22 ]

acm I think you’re probably correct here. I know from some recent work in the Icecream code that there’s actually quite a bit of room for fixing any problems we might encounter. When we made these original changes, we had no control over the Icecream version, too. Now that we provide our own release, it might make sense to have a more definitive policy here.

Comment by Ryan Egesdahl (Inactive) [ 30/Dec/21 ]

spencer.jackson On second thought, I don't know that simply adding the flag in CCFLAGS is going to help much because the issue is that ccache is doing the compile in a temporary directory, which is throwing off all the paths. We would need a more extensive workaround of what ccache is doing here, if it's possible to do. For now, the SCons cache is probably what you want to use here.

Comment by Ryan Egesdahl (Inactive) [ 29/Dec/21 ]

spencer.jackson We deliberately disabled CCACHE_CPP2 because it doesn't play well in all scenarios - especially when Icecream is in play - and there are too many combinations to consider when setting it up. However, we are planning to explore using -ffile-prefix-map as part of other work where the compile path seems to matter. That would be a preferable solution here as well if it works. Could you please give it a try? It should be available for clang in the v4 toolchain. I think something like -ffile-prefix-map=$(pwd)=/ might work.

Generated at Thu Feb 08 05:54:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.