[SERVER-49375] Clang with --link-model=dynamic produces binaries that behave incorrectly with respect to decimal128 Created: 08/Jul/20  Updated: 29/Oct/23  Resolved: 14/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Andrew Morrow (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File server49375_buildinstall.tar.gz    
Issue Links:
Related
is related to SERVER-48546 Linker error building with scons --gd... Closed
is related to SERVER-41835 symbol clashes: libgcc vs vendored In... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Dev Platform 2020-07-27
Participants:

 Description   

I have a checkout of master at 0b2b705b0de0e0f7f8ba28604dc5585a5dc5ba0b and the enterprise modules 2c965327da25f25e974e3402c891ddca82b096ef. I am building on Ubuntu 18.04 with the v3 mongodbtoolchain version of clang. This produces build artifacts which fail a number of basic tests which all seem to be related to decimal128. Here's how I'm building:

$ cat ~/.scons/site_scons/mongo_custom_variables.py
# We these for short_describe below.
import os
import subprocess
 
CCACHE="ccache"
ICECC="icecc"
 
# A little function that gives me a real mongo version, but without the git hash.
def short_describe():
    import os
    import subprocess
    with open(os.devnull, "r+") as devnull:
        proc = subprocess.Popen("git describe --abbrev=0",
            stdout=subprocess.PIPE,
            stderr=devnull,
            stdin=devnull,
            shell=True)
        return proc.communicate()[0].decode('utf-8').strip()[1:]
 
MONGO_VERSION=short_describe()
MONGO_GIT_HASH="unknown"
 
CC="/opt/mongodbtoolchain/v3/bin/clang"
CXX="/opt/mongodbtoolchain/v3/bin/clang++"
 
$ python3 buildscripts/scons.py --link-model=dynamic --ninja generate-ninja
$ ninja -j 500 build/install/bin/bson_test

The resulting bson_test fails like so:

$ build/install/bin/bson_test
...
{"t":{"$date":"2020-07-08T19:43:27.946Z"},"s":"I",  "c":"TEST",     "id":23068,   "ctx":"main","msg":"FAILURE","attr":{"failedTestsCount":4,"failedSuitesCount":2,"failedTests":["BSONElementIntegerParseTest/ParseIntegerElementToNonNegativeLongRejectsNonIntegralDecimal","BSONElementIntegerParseTest/ParseIntegerElementToLongRejectsNonIntegralDecimal","BSONElementIntegerParseTest/ParseIntegerElementToLongRejectsLargestDecimal","BSONObjCompare/NumberDecimalCompareDoubleNoDoubleRepresentation"]}}

These are the two failing assertions:

I haven't checked whether the problem still repros without ninja or without icecream. I do know that it does not repro with static linking and it does not repro with gcc.



 Comments   
Comment by Andrew Morrow (Inactive) [ 15/Jul/20 ]

The issue we filed has been closed in favor of https://bugs.llvm.org/show_bug.cgi?id=44842.

Comment by Githook User [ 14/Jul/20 ]

Author:

{'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}

Message: SERVER-49375 Disable lld for --link-model=dynamic builds
Branch: master
https://github.com/mongodb/mongo/commit/133fad4c0cbeee8d4872c504aeab153fb7364f09

Comment by Andrew Morrow (Inactive) [ 10/Jul/20 ]

I filed a bug with LLVM and wrote up a minimal repro, see the above links.

Comment by Andrew Morrow (Inactive) [ 09/Jul/20 ]

The issue appears to be an inconsistency in how lld does symbol resolution. Neither ld.gold nor the BFD linker behave as ld.lld does, in what is admittedly a complex edge case. In slightly more detail, we expect that when libbase is on the link line, that the linker will not attempt to resolve its undefined symbols. It appears that {lld actually does try to do so, and it ends up resolving it into the static libgcc, despite the version script adornments that should prevent it from doing so.

I think the easiest solution for now is to disable -fuse-ld=lld in --link-model=dynamic builds, and file a follow-up SERVER ticket to continue the investigation with upstream. Hopefully we can eventually re-enable -fuse-ld=lld

Comment by Andrew Morrow (Inactive) [ 08/Jul/20 ]

As a temporary workaround until we address the root cause, if you are affected by this bug you can add LINKFLAGS=-fuse-ld=gold to your SCons invocation.

Comment by David Storch [ 08/Jul/20 ]

I have attached file server49375_buildinstall.tar.gz which is an archive of my build/install directory immediately after reproducing this bug, as requested by acm.

Generated at Thu Feb 08 05:19:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.