[SERVER-37998] Upgrade Intel Decimal FP Library to at least 2.0 Update 2 Created: 07/Nov/18 Updated: 28/Aug/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathew Robinson (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | third-party-libraries | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||
| Sprint: | Dev Tools 2019-01-14 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Comments |
| Comment by April Schoffer [ 29/Jun/20 ] | |||
|
We attempted to upgrade, but this will involve additional collaboration with Intel. Sending to backlog for now. | |||
| Comment by Ryan Egesdahl (Inactive) [ 26/Jun/20 ] | |||
|
geert.bosch This issue is just being set aside for now. We intend to communicate with upstream on this further when we can focus more effort on it. Regarding the compiler options, unfortunately I don't think we're able to specifically select a C compiler for anything in particular. However, the compiler option should still have had an effect because gcc applies it to the source file type rather than whether you have run gcc versus g++. | |||
| Comment by Geert Bosch [ 26/Jun/20 ] | |||
|
Note that this option is does not have an effect for C++ compilations, so make sure to compile using the C compiler. I agree with putting this in the backlog, but I do think it's important to provide as much feedback as possible to the maintainers about the issues that prevent our upgrade. For example, their exponentiation fixes were counter productive and caused my carefully considered test to fail. While it's not a big deal, as we don't promise specific performance guarantees for numerical operations on decimal, it's still something to worry about. The most important thing is to let the library maintainers know that we deeply care about these issues, and that we care about accuracy and correctness foremost. I hope that the move to GitHub will make our interactions more productive, as we'd be able to present patches to fix current issues. | |||
| Comment by Ryan Egesdahl (Inactive) [ 26/Jun/20 ] | |||
|
For what it's worth, I tried compiling with -fexcess-precision=standard (among many other options previously), and the test still hangs in the same way as before. The problem only affects GCC - other compilers seem fine. I'm going to re-backlog this item for now. | |||
| Comment by Andrew Morrow (Inactive) [ 23/Jun/20 ] | |||
|
geert.bosch - I'm inclined to agree that we should stop progress on this. We attempted the upgrade as we had planned to do once v3 was in play, and we have found that it isn't what we need. I'm unaware of any specific bugs that motivate an update right now. I do agree that we should reach out to the maintainers. They are generally responsive. | |||
| Comment by Geert Bosch [ 22/Jun/20 ] | |||
|
I'm quite familiar with GCC bug 323, but it should no longer be an issue on x86-64, as the ABI uses SSE2 now for argument passing and return, and implicit promotion to long double should not happen. It's been a while since I looked, but can you verify that this is indeed related to excess precision? Does the Intel code in question use long double? If so, there are compilation options for C code to work around those excess precision issues, so we could consider compiling the Intel library that way. Still, overall, my recommendation is to not upgrade the Intel library, but report the issues we're finding as bugs against that library. | |||
| Comment by Ryan Egesdahl (Inactive) [ 20/Jun/20 ] | |||
|
I've managed to update the source, but it looks like the problem geert.bosch noted a year and a half ago still exists in the v3 toolchain. I did a bit more digging, and I came up with some GCC bug reports that explain what's going on: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 The low number of the first bug report should concern you. It's been around for 20 years. The bug report is fascinating both from a technical and a psychosocial standpoint if you are truly interested, but for those who are not, the TL;DR version is this: GCC doesn't implement IEEE semantics for excess floating-point precision for C++, it never has, and nobody seems interested in making it happen. In other words, this is not a problem with any particular version of our toolchain but a compatibility problem between this library and GCC, and neither the GCC developers nor Intel see it as one worth solving right now. There are ways we could address it ourselves (assuming we would rather not wait on what is likely to never happen), but they are almost certainly going to involve custom patches to GCC or changing over to Clang as our primary compiler. Or we can use a different decimal floating point library, of course. The core of the problem is that Intel develops this library for the Intel compilers. All other compilers, including GCC, are of secondary importance. Unfortunately, that leaves us with no small choices to make here except to not upgrade the library at this point. It needs to wait until we've made some much larger decisions about either our toolchain or our choice of library. | |||
| Comment by Andrew Morrow (Inactive) [ 27/Jun/19 ] | |||
|
geert.bosch - Should we try this again now that we have the v3 toolchain in place? Should be easy to rebase and patch queue it. | |||
| Comment by Andrew Morrow (Inactive) [ 25/Jan/19 ] | |||
|
Per discussion with geert.bosch, we are going to probably hold off on this until upstream makes changes. Marking as features-we-are-not-sure-of. | |||
| Comment by Geert Bosch [ 14/Dec/18 ] | |||
|
The new version of this library has changes to the bid128_pow function in addition to the {[bid128_acos}} fix we need. However, compiling the math library with optimization on and the GCC from the MongoDB toolchain results in a hang when evaluating bid128_pow(Decimal128("0.9999999999999999999999999999999999"), Decimal128("0.9999999999999999999999999999999999")).
The new library produces a result that has a 1 unit in the last place (ulp) difference and yields NumberDecimal("5192296858534827628530496329220097E-112". As the power function does not pertain to be correctly rounded (neither does the double precision floating point version), that is not necessarily a disqualifying issue for the update. The reason for the error is that there is new code that uses repeated squaring and multiplication for integer exponents. For negative exponents, a final division is used, but as 5^112 has 79 decimal digits, this divisor is rounded causing double rounding for the final result. The repeated squaring and multiplication method results in an error of up to 0.5 ulp per iteration. As the new code path is used for arguments up to 2^32, the final error can be up to about 16 ulp, though cancellation would result in typical error much less than that. It probably would make sense to check if the mis-compilation goes away with the GCC from toolchain v3. I verified it does not affect the MS VS2017 optimized build. We should also visually check warnings, especially in the new code, to see if those indicate issues in the new code. One positive takeaway of the new library is that the code is still maintained, and that there are no changes to any of the conversion code, so we can have confidence that conversions behave exactly as before. |