[SERVER-27906] mutable_bson_test failure on ppc64le Created: 03/Feb/17  Updated: 13/Feb/17  Resolved: 13/Feb/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marek Skalický Assignee: Andrew Morrow (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ppc64le system


Attachments: Text File mutable_bson_test-output.txt    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

~]$ cat variables.list
CCFLAGS="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mcpu=power8 -mtune=power8"
LINKFLAGS="-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-z,noexecstack -Wl,--reduce-memory-overheads,--no-keep-memory"
CPPDEFINES="BOOST_OPTIONAL_USE_SINGLETON_DEFINITION_OF_NONE ASIO_STANDALONE ASIO_SEPARATE_COMPILATION"
VERBOSE=1
~]$ scons all -j4 --mmapv1=on --wiredtiger=on --ssl --nostrip --disable-warnings-as-errors --variables-files=variables.list

Sprint: Platforms 2017-02-13
Participants:

 Description   

build/opt/mongo/bson/mutable/mutable_bson_test started to fail on ppc64le in 3.4.2 (it 3.4.1 it worked well). Exactly "EncodingEquivalenceRegex" subtest is failing.

It is not failing on other platforms.

2017-02-03T13:01:34.758+0000 I -        [main] **************************************************
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.778+0000 2017-02-03T13:01:34.758+0000 I -        [main] ArrayAPI                       | tests:    1 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.778+0000 2017-02-03T13:01:34.758+0000 I -        [main] DecimalType                    | tests:    3 | fails:    0 | assert calls:          0 | time secs:  0.002
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.778+0000 2017-02-03T13:01:34.758+0000 I -        [main] Document                       | tests:   40 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.778+0000 2017-02-03T13:01:34.758+0000 I -        [main] DocumentComparison             | tests:    9 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.778+0000 2017-02-03T13:01:34.758+0000 I -        [main] DocumentInPlace                | tests:   27 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] Documentation                  | tests:    4 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] Element                        | tests:   11 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] OIDType                        | tests:    2 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] SafeNumType                    | tests:    4 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] Serialization                  | tests:    1 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.779+0000 2017-02-03T13:01:34.758+0000 I -        [main] TimestampType                  | tests:    3 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 2017-02-03T13:01:34.758+0000 I -        [main] TopologyBuilding               | tests:   16 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 2017-02-03T13:01:34.758+0000 I -        [main] TypeSupport                    | tests:   21 | fails:    1 | assert calls:          0 | time secs:  0.002
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 	EncodingEquivalenceRegex	Expected: identical(thing, mmb::ConstElement(a).getValue()) @src/mongo/bson/mutable/mutable_bson_test.cpp:2276
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 2017-02-03T13:01:34.758+0000 I -        [main] UnorderedEqualityChecker       | tests:    8 | fails:    0 | assert calls:          0 | time secs:  0.000
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 2017-02-03T13:01:34.758+0000 I -        [main] TOTALS                         | tests:  150 | fails:    1 | assert calls:          0 | time secs:  0.004
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.780+0000 2017-02-03T13:01:34.758+0000 I -        [main] Failing tests:
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.781+0000 2017-02-03T13:01:34.758+0000 I -        [main] 	 TypeSupport/EncodingEquivalenceRegex Failed
[cpp_unit_test:mutable_bson_test] 2017-02-03T13:01:34.781+0000 2017-02-03T13:01:34.758+0000 I -        [main] FAILURE - 1 tests in 1 suites failed
[executor:cpp_unit_test:job0] 2017-02-03T13:01:34.781+0000 mutable_bson_test ran in 0.21 seconds.

Full output of the test is attached.



 Comments   
Comment by Andrew Morrow (Inactive) [ 13/Feb/17 ]

Hi mskalick - Given that, I think it is appropriate to close this issue, given that the current hypothesis is a mis-compile due to GCC 7. If your investigation however reveals anything different (e.g. there is some subtle UB in our code), please feel free to re-open the issue or otherwise bring it back to our attention.

Comment by Marek Skalický [ 09/Feb/17 ]

I compile the test with enabled optimizations (test fails). I remove element.o file and run gcc command (same as invoked by scons for this file) and change optimization to -O0 in this gcc call. Then again run scons command, which only regenerates build/7a67446c/mongo/bson/mutable/libmutable_bson.a and links the test. Then test is running. So only element.o was recompiled to get test running.

Comment by Andrew Morrow (Inactive) [ 08/Feb/17 ]

That is somewhat strange - element.cpp is a pretty small and simple file. It does include a fair amount of other code, and calls into document.cpp, which is much more complex. What leads you to conclude that it is a mis-compilation of element.cpp?

Comment by Marek Skalický [ 08/Feb/17 ]

Update: The failure is cause by optimizations while compiling build/opt/mongo/bson/mutable/element.o. With -O1 it is failing, with -O0 it is working fine.

Comment by Marek Skalický [ 07/Feb/17 ]

On other architectures it is working fine. So it is only ppc64le issue.

Even on ppc64be unit tests are passing, so I guess it is bug in gcc. Any idea how to get reproducer? (locate the issue)

Comment by Andrew Morrow (Inactive) [ 07/Feb/17 ]

Hi mskalick -

It is certainly possible that mutable bson contains a latent defect that has been exposed by GCC 7. If that is the case, I'd expect it to be some sort of subtle undefined behavior. Another possibility is that we are exposing a bug in GCC.

Do you get this same behavior on x86_64 Fedora 26 GCC 7 systems? We don't happen to have any Fedora 26 ppc64le machines available, or any ppc64le that we can conveniently repurpose, so it is a little difficult for us to track this down on that platform. I think knowing whether this is arch specific would be a good first step.

Thanks for the notes on the --param= stuff.

Comment by Marek Skalický [ 07/Feb/17 ]

It is caused by new gcc 7.0.1.

It seems that src/mongo/bson/mutable/element.cpp (or included headers) is causing the problem. After rebuilding build/opt/mongo/bson/mutable/element.o with gcc 6.3.1 the unit test works.

Do you have any idea what can cause the problem? What parts of element.cpp are used in EncodingEquivalenceRegex test?

We don't happen to set --param=ssp-buffer-size=4, so if you have some guidance on that parameter we would be interested.

I've asked: "It's a historic setting for of -fstack-protector. Everything should use -fstack-protector-strong these days, so it should no longer be necessary."

Comment by Marek Skalický [ 06/Feb/17 ]

though there are several RHEL specific things in there that I'm not sure I understand. I also see some CPPDEFINES that are suggestive that you are using the system versions of libraries, but I don't see any --use-system-x calls. Can you please confirm that the SCons invocation is complete?

Actually it is Fedora 26, not RHEL.

I was building it with --use-system-x calls, but I get same result without it (command mentioned in description of this bug is right). I've only forgotten to remove CPP defines, I'll try without it.

I also notice that you are building with --disable-warnings-as-errors. What toolchain or GCC version are you using here?

It is gcc 7.0.1-0.5.fc26. Using this option from gcc 5.0.o (SERVER-17235), I'll try if it is still needed.

There is no change in Fedora mongodb code (spec file) between successful 3.4.1 build and failed 3.4.2. Gcc was updated so maybe it is a problem... I will try to figure it out tomorrow.

Finally, I'll note that several of the options that you are providing to build with runtime hardening are now automatically provided by the mongodb build, in particular _FORTIFY_SOURCE=2, -Wl,-z,relro, -Wl,-z,noexecstack, and -fstack-protector-strong. We don't happen to set --param=ssp-buffer-size=4, so if you have some guidance on that parameter we would be interested. Many other options appear to be either MongoDB build system defaults (-O2, -g, -Wall), or GCC defaults (-fexceptions, -gcrecord-gcc-switches).

Values in variables.list file are mainly suggested values for all Fedora packages - so I am using them. I am not the right person to ask, I'll ask and let you know.

Comment by Andrew Morrow (Inactive) [ 04/Feb/17 ]

Hi mskalick -

Thank you for the bug report. Our CI builds of 3.4.2 on RHEL 7 ppc64le do not show this issue: https://evergreen.mongodb.com/task/mongodb_mongo_v3.4_enterprise_rhel_71_ppc64le_unittests_3f76e40c105fc223b3e5aac3e20dcd026b83b38b_17_02_01_19_09_59

I've also taken a quick look through the commits between r3.4.1 and r3.4.2, and I don't see any changes that seemed like potential culprits.

Your build command looks reasonable (nice to see --variables-files getting some use!) though there are several RHEL specific things in there that I'm not sure I understand. I also see some CPPDEFINES that are suggestive that you are using the system versions of libraries, but I don't see any --use-system-x calls. Can you please confirm that the SCons invocation is complete?

Given that you say you can't repro this with 3.4.1 but you can with 3.4.2, and we don't see this issue at all, perhaps the best way to proceed would be for you to do a git bisect of the commits between 3.4.1 and 3.4.2 and see if you can find a specific commit that broke it for you. You can shorten the bisect cycle by building just the failing target, build/opt/mongo/bson/mutable/mutable_bson_test, rather than all. If you script the bisect it should be pretty fast.

I also notice that you are building with --disable-warnings-as-errors. What toolchain or GCC version are you using here?

Finally, I'll note that several of the options that you are providing to build with runtime hardening are now automatically provided by the mongodb build, in particular _FORTIFY_SOURCE=2, -Wl,-z,relro, -Wl,-z,noexecstack, and -fstack-protector-strong. We don't happen to set --param=ssp-buffer-size=4, so if you have some guidance on that parameter we would be interested. Many other options appear to be either MongoDB build system defaults (-O2, -g, -Wall), or GCC defaults (-fexceptions, -gcrecord-gcc-switches).

Generated at Thu Feb 08 04:16:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.