[SERVER-33395] PPC64 little endian altivec optimizations are broken on newer gcc Created: 20/Feb/18 Updated: 08/Jan/24 Resolved: 04/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Packaging |
| Affects Version/s: | 3.4.13, 3.6.2 |
| Fix Version/s: | 3.4.16, 3.6.6, 4.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marek Skalický | Assignee: | Mira Carey |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v3.6, v3.4
|
||||
| Steps To Reproduce: | $ |
||||
| Sprint: | Platforms 2018-03-26, Platforms 2018-05-07 | ||||
| Participants: | |||||
| Description |
|
Hi,
When I disable/remove altivec include in src/mongo/db/fts/unicode/byte_vector.h everything works. When I debuged byte_vector_test and error {{FAIL: MaskAny Expected ByteVector::countInitialZeros(mask) == offset (16 == 0) @src/mongo/db/fts/unicode/byte_vector_test.cpp:70}} I figured out that vec_extract(vec_vbpermq(_data, bits), 0) in ByteVector::maskHight() returns 0. I don't know if this is mongodb problem or problem in compiler,... Do you have any idea how to easily test it? (I don't know ppc64 altivec instructions well) What could be the minimal reproducer in case of compiler problem? |
| Comments |
| Comment by Githook User [ 22/May/18 ] | ||||||||||||||||||||||||||||
|
Author: {'username': 'hanumantmk', 'name': 'Jason Carey', 'email': 'jcarey@argv.me'}Message: GCC changed the output of vec_vbpermq between 5.4.0 and later. This (cherry picked from commit b989fe4b0cfba4062c5bb7fb68337f188070c1c8) | ||||||||||||||||||||||||||||
| Comment by Githook User [ 16/May/18 ] | ||||||||||||||||||||||||||||
|
Author: {'email': 'jcarey@argv.me', 'username': 'hanumantmk', 'name': 'Jason Carey'}Message: GCC changed the output of vec_vbpermq between 5.4.0 and later. This (cherry picked from commit b989fe4b0cfba4062c5bb7fb68337f188070c1c8) | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 16/May/18 ] | ||||||||||||||||||||||||||||
|
No worries mskalick, I'm currently planning on making a backport to 3.6, but it probably won't land until 3.6.6 (just missed getting it in for the 3.6.5 release). | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 16/May/18 ] | ||||||||||||||||||||||||||||
|
Hi jason. I'm sorry, but currently I'm not building master branch... so it was easier for me to try to apply the patch to 3.6.4 (see attachment). And with this patch unittests and core JS tests are passing for me on both Little and Big Endian ppc64. | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 04/May/18 ] | ||||||||||||||||||||||||||||
|
mskalick, I've pushed a change to pick up at configure time where vec_vbpermq puts it's output. If you test against the master branch of mongodb, and it fixes your issue, I'll go ahead with the backports | ||||||||||||||||||||||||||||
| Comment by Githook User [ 04/May/18 ] | ||||||||||||||||||||||||||||
|
Author: {'email': 'jcarey@argv.me', 'name': 'Jason Carey', 'username': 'hanumantmk'}Message: GCC changed the output of vec_vbpermq between 5.4.0 and later. This | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 24/Apr/18 ] | ||||||||||||||||||||||||||||
|
Current implementation is working with new gcc on ppc64 Big Endian. And with changes proposed in this ticket, the tests are failing on ppc64be. | ||||||||||||||||||||||||||||
| Comment by Apollon Oikonomopoulos [ 23/Mar/18 ] | ||||||||||||||||||||||||||||
|
This is causing build failures on Debian as well. Applying the change fixes the issue, so I've included it in the latest upload. | ||||||||||||||||||||||||||||
| Comment by Dimitri John Ledkov [ 22/Mar/18 ] | ||||||||||||||||||||||||||||
|
I can confirm we are experiencing same test suite failures on Ubuntu now too. Will try out the proposed change. | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 12/Mar/18 ] | ||||||||||||||||||||||||||||
|
Thanks. If you have any question or anything to test, feel free to ask. | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 09/Mar/18 ] | ||||||||||||||||||||||||||||
|
mskalick, I wrote the altivec byte vector implementation. It's perfectly possible that I tried the one side of vbpermq's output, failed to see bytes, then tried the other and stuck with what worked (on gcc 5.4). I'll check the docs and see what the api actually specifies. If that fails we've got some contacts at IBM I can follow up with. | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 07/Mar/18 ] | ||||||||||||||||||||||||||||
|
I can confirm that with this change all three mentioned tests pass. Can you ask someone who wrote this code to know what should be right? If gcc 5.4 has a bug or newer gcc has one. | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 06/Mar/18 ] | ||||||||||||||||||||||||||||
|
Hmm, interesting. This code was written for little endian, and continues to work in our CI environment with gcc 5.4 (on that platform) I suppose the interesting story is if gcc 5.4 had the bug, and later versions actually have the fix. If you swap the 0 for a 1 in byte_vector
Do unit tests pass for you? If so... I suppose I can add explicit compiler version checks, or look into checking the endianness of vec_vbpermq in our configure step | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 05/Mar/18 ] | ||||||||||||||||||||||||||||
|
Jason, please take a look in the gcc bug report. It seems that reproduces is really doing what it should do. So Dave is asking if you really want to use uint64_t mask = vec_extract(vec_vbpermq(vec, bits), 0);? With 0 substituted by 1 reproducer is working fine... | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 28/Feb/18 ] | ||||||||||||||||||||||||||||
|
I don't think that this is about optimizations. Even with g++ -O0 your reproducer aborts. | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 27/Feb/18 ] | ||||||||||||||||||||||||||||
|
Thanks for filing the bug report and for the link! | ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 27/Feb/18 ] | ||||||||||||||||||||||||||||
|
Awesome! Thank you. It aborts. So I reported a bug - https://bugzilla.redhat.com/show_bug.cgi?id=1549628 | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 23/Feb/18 ] | ||||||||||||||||||||||||||||
|
I transcribed out what that test does into a minimal repro. It should abort if vbpermq is working differently than expected:
| ||||||||||||||||||||||||||||
| Comment by Marek Skalický [ 23/Feb/18 ] | ||||||||||||||||||||||||||||
|
I'm not saying it is not compiler bug. Interesting is that it appeared to me in three different environments. | ||||||||||||||||||||||||||||
| Comment by Mira Carey [ 20/Feb/18 ] | ||||||||||||||||||||||||||||
|
My inclination is to say compiler bug. For context: The test is green in our CI environment today (https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_71_ppc64le_compile_all_2ae87330910dd7af58507c77d0363d267be8381e_18_02_16_21_39_39) with gcc 5.4.0 It appears that there is a current gcc bug live on 5.5.0, 6.4.1, 7.2.1, 8.0 which breaks vbpermq in particular (seems like a bad optimization pass) Any chance you're on one of those affected versions? |