[SERVER-27906] mutable_bson_test failure on ppc64le Created: 03/Feb/17 Updated: 13/Feb/17 Resolved: 13/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marek Skalický | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ppc64le system |
||
| Attachments: |
|
||||||
| Backwards Compatibility: | Fully Compatible | ||||||
| Operating System: | ALL | ||||||
| Steps To Reproduce: |
|
||||||
| Sprint: | Platforms 2017-02-13 | ||||||
| Participants: |
| Description |
|
build/opt/mongo/bson/mutable/mutable_bson_test started to fail on ppc64le in 3.4.2 (it 3.4.1 it worked well). Exactly "EncodingEquivalenceRegex" subtest is failing. It is not failing on other platforms.
Full output of the test is attached. |
| Comments |
| Comment by Andrew Morrow (Inactive) [ 13/Feb/17 ] |
|
Hi mskalick - Given that, I think it is appropriate to close this issue, given that the current hypothesis is a mis-compile due to GCC 7. If your investigation however reveals anything different (e.g. there is some subtle UB in our code), please feel free to re-open the issue or otherwise bring it back to our attention. |
| Comment by Marek Skalický [ 09/Feb/17 ] |
|
I compile the test with enabled optimizations (test fails). I remove element.o file and run gcc command (same as invoked by scons for this file) and change optimization to -O0 in this gcc call. Then again run scons command, which only regenerates build/7a67446c/mongo/bson/mutable/libmutable_bson.a and links the test. Then test is running. So only element.o was recompiled to get test running. |
| Comment by Andrew Morrow (Inactive) [ 08/Feb/17 ] |
|
That is somewhat strange - element.cpp is a pretty small and simple file. It does include a fair amount of other code, and calls into document.cpp, which is much more complex. What leads you to conclude that it is a mis-compilation of element.cpp? |
| Comment by Marek Skalický [ 08/Feb/17 ] |
|
Update: The failure is cause by optimizations while compiling build/opt/mongo/bson/mutable/element.o. With -O1 it is failing, with -O0 it is working fine. |
| Comment by Marek Skalický [ 07/Feb/17 ] |
|
On other architectures it is working fine. So it is only ppc64le issue. Even on ppc64be unit tests are passing, so I guess it is bug in gcc. Any idea how to get reproducer? (locate the issue) |
| Comment by Andrew Morrow (Inactive) [ 07/Feb/17 ] |
|
Hi mskalick - It is certainly possible that mutable bson contains a latent defect that has been exposed by GCC 7. If that is the case, I'd expect it to be some sort of subtle undefined behavior. Another possibility is that we are exposing a bug in GCC. Do you get this same behavior on x86_64 Fedora 26 GCC 7 systems? We don't happen to have any Fedora 26 ppc64le machines available, or any ppc64le that we can conveniently repurpose, so it is a little difficult for us to track this down on that platform. I think knowing whether this is arch specific would be a good first step. Thanks for the notes on the --param= stuff. |
| Comment by Marek Skalický [ 07/Feb/17 ] |
|
It is caused by new gcc 7.0.1. It seems that src/mongo/bson/mutable/element.cpp (or included headers) is causing the problem. After rebuilding build/opt/mongo/bson/mutable/element.o with gcc 6.3.1 the unit test works. Do you have any idea what can cause the problem? What parts of element.cpp are used in EncodingEquivalenceRegex test?
I've asked: "It's a historic setting for of -fstack-protector. Everything should use -fstack-protector-strong these days, so it should no longer be necessary." |
| Comment by Marek Skalický [ 06/Feb/17 ] |
Actually it is Fedora 26, not RHEL. I was building it with --use-system-x calls, but I get same result without it (command mentioned in description of this bug is right). I've only forgotten to remove CPP defines, I'll try without it.
It is gcc 7.0.1-0.5.fc26. Using this option from gcc 5.0.o ( There is no change in Fedora mongodb code (spec file) between successful 3.4.1 build and failed 3.4.2. Gcc was updated so maybe it is a problem... I will try to figure it out tomorrow.
Values in variables.list file are mainly suggested values for all Fedora packages - so I am using them. I am not the right person to ask, I'll ask and let you know. |
| Comment by Andrew Morrow (Inactive) [ 04/Feb/17 ] |
|
Hi mskalick - Thank you for the bug report. Our CI builds of 3.4.2 on RHEL 7 ppc64le do not show this issue: https://evergreen.mongodb.com/task/mongodb_mongo_v3.4_enterprise_rhel_71_ppc64le_unittests_3f76e40c105fc223b3e5aac3e20dcd026b83b38b_17_02_01_19_09_59 I've also taken a quick look through the commits between r3.4.1 and r3.4.2, and I don't see any changes that seemed like potential culprits. Your build command looks reasonable (nice to see --variables-files getting some use!) though there are several RHEL specific things in there that I'm not sure I understand. I also see some CPPDEFINES that are suggestive that you are using the system versions of libraries, but I don't see any --use-system-x calls. Can you please confirm that the SCons invocation is complete? Given that you say you can't repro this with 3.4.1 but you can with 3.4.2, and we don't see this issue at all, perhaps the best way to proceed would be for you to do a git bisect of the commits between 3.4.1 and 3.4.2 and see if you can find a specific commit that broke it for you. You can shorten the bisect cycle by building just the failing target, build/opt/mongo/bson/mutable/mutable_bson_test, rather than all. If you script the bisect it should be pretty fast. I also notice that you are building with --disable-warnings-as-errors. What toolchain or GCC version are you using here? Finally, I'll note that several of the options that you are providing to build with runtime hardening are now automatically provided by the mongodb build, in particular _FORTIFY_SOURCE=2, -Wl,-z,relro, -Wl,-z,noexecstack, and -fstack-protector-strong. We don't happen to set --param=ssp-buffer-size=4, so if you have some guidance on that parameter we would be interested. Many other options appear to be either MongoDB build system defaults (-O2, -g, -Wall), or GCC defaults (-fexceptions, -gcrecord-gcc-switches). |