[SERVER-48286] Handle accuracy issues of casting long long to double Created: 19/May/20  Updated: 26/May/20  Resolved: 26/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Justin Seyster
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-37183 BSONElement::safeNumberLong is not safe Closed
Operating System: ALL
Sprint: Query 2020-06-15
Participants:
Linked BF Score: 0

 Description   

A double on 64-bit architectures uses 1 bit for sign, 52 bits for value, and 11 bits for exponent, thus, the maximum integer value that it can contain without losing precision is 0x1FFFFFFFFFFFFF (9,007,199,254,740,991). This is much smaller than LLONG_MAX (9,223,372,036,854,775,807) and the range long long covers.

There are instances that we cast LLONG_MAX to double and do a comparison (see here. We should ensure the accuracy of such comparisons in presence of values that cause precision issues for double.



 Comments   
Comment by Justin Seyster [ 26/May/20 ]

amirsaman.memaripour, thanks for the investigative work on this issue! You are right that the check at bsonelement.h:898 relies on undefined behavior: the result of

(double)std::numeric_limits<long long>::max()

can actually vary depending on the hardware architecture and/or the software environment. The result is a very rare edge case, where on some architectures (in practice, most) this check considers the exact value double{2^63} as be safe for conversion (i.e., within the range of long long) when it actually is not. Casting double{2^63} to long long is also undefined behavior, and we have observed different outcomes (including crashing).

 

We fixed this problem in SERVER-38623, which landed in v4.2, but for compatibility reasons, we can't backport the fix to v4.0. The result of safeNumberLongForHash() gets stored in on-disk indexes, and any change in its behavior could effectively corrupt those indexes. The upgrade instructions for v4.2 include specific steps to make sure that any affected indexes get rebuilt so as to make sure the remain valid with the modified hashing semantics.

If you're interested in the gory details, you can read through the discussions in SERVER-37183 and SERVER-38623.

Because this is fixed in v4.2 but can't be backported, I'm going to close as "Won't fix."

 

Comment by Eric Milkie [ 19/May/20 ]

That second code location is for numberLong(), for which safeNumberLong() was created to avoid the problems of casting double to long long. I wonder if we should try to work towards getting rid of the unsafe numberLong() function – its usage is pervasive across the codebase, unfortunately.

Comment by Amirsaman Memaripour [ 19/May/20 ]

milkie I think it's also possible to fix the issue by changing > to >= in here.

Regarding other instances, I'm not sure if they could cause any issues, but another cast (that could potentially cause issues) is this one.

Comment by Eric Milkie [ 19/May/20 ]

The idea of the code change in SERVER-37183 was to preserve the undefined-behavior for hashing only. It doesn't matter that it is losing precision as long as the behavior remains the same and creates the same hash value for a given double on a given platform. We might need to do something to get the UBSAN to ignore this particular area.

The Description of this ticket says there are instances plural – where else in the code are we doing this?

Generated at Thu Feb 08 05:16:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.