[SERVER-2950] floating point representation varies from one way to insert to another ? Created: 14/Apr/11 Updated: 12/Jul/16 Resolved: 09/Jun/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Tools |
| Affects Version/s: | 1.8.0 |
| Fix Version/s: | 1.9.1 |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | olivier barbecot | Assignee: | Aaron Staple |
| Resolution: | Done | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ubuntu 10.10 64 bits |
||
| Issue Links: |
|
||||
| Participants: | |||||
| Description |
|
Hi, i'm facing a problem while trying to store geolocalised points and key / values attached to it. The problem i'm encountering is that i need to check if a given point (identified by a couple lat / long) is already existing in the database or not. For instance, i try to store value 0.7 as a float using each of the 3 ways. The results i get are the following : { "_id" : ObjectId("4da6e294aea05d721a4a87b9"), "value_from_mongoimport" : 0.7000000000000001 } { "_id" : ObjectId("4da6e2a3e36333098e18d40c"), "value_from shell" : 0.7 } { "_id" : ObjectId("4da6e2aa11f4ac1bc0000000"), "value_from_pymongo" : 0.7 }I'm precisely using the "mongoimport way" to store data as it offers better performances than other ways. The problem is, at a later time, when i do, from python, somethink like : ), i get none ...... I understand the problem of floating point representation, but why is it only affecting mongoimport ? is this something that implies the json to bson transformation ? Help appreciated. Regards. |
| Comments |
| Comment by auto [ 09/Jun/11 ] |
|
Author: {u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}Message: |
| Comment by Dave Steinberg [ 03/Jun/11 ] |
|
The issue I see here isn't so much the difference between how python and mongo handle floats, it's the difference between what the shell does and what mongoimport does. Mongo ought to be internally consistent, and from there, the Python/Ruby/etc guys can build their drivers appropriately. Take this simple document: {"_id":"a","lat":31.437557,"long":-83.516034}Calling insert in the shell yields an identical representation. Stuffing that json string a text file and loading it via mongoimport produces: { "_id" : "a", "lat" : 31.437556999999998, "long" : -83.516034 }Clearly there's a difference in the way those numbers are being handled, and it's not consistent across the toolset. |
| Comment by Remon van Vliet [ 04/May/11 ] |
|
Not sure if I agree with that last statement. I agree that equality checks should work if the floating point values are binary equals, and it does. However, mongo has no control over how a specific language/cpu/compiler will convert your "0.7" string to a IEEE 754 floating point value. Many differences are usually the result of rounding strategies (turning 0.10000000149011 back to 0.1 for example). The most obvious problem with this is that something like a 0.7 / 20000.0 == 0.000035 check will actually return false. It's a good software development rule to never to equality checks on floating point numbers. Using small ranges is actually the standard practice when comparing floating point values where the range is usually referred to as epsilon. In other words, your equality check should be (pseudo) : equal = abs(value1 - value2) < epsilon. Now, a case can be made to make mongo do this rather than you app but i think that might make floating point equality tests quite a bit slower. That said I'm not sure the binary compare is in any way useful in most apps for the floating point case specifically for stability/accuracy reasons mentioned above. |
| Comment by olivier barbecot [ 26/Apr/11 ] |
|
Unfortunately, i am not able to use 1 language exclusively. Using floating point queries with small intervals rather than exact matches might be a solution, you're right, but i'm a bit reluctant with this approach. Another solution i found is to use a geohash as the _id. I understand different languages parse and print floats differently. Nevertheless languages used in mongodb should all act the same. regards. |
| Comment by Richard Kreuter (Inactive) [ 19/Apr/11 ] |
|
Different languages parse and print floats differently. The parser we use in mongoimport seems to differ from the ones used in JavaScript or Python, but none of them is exact, and no two of them agree about both parsing and printing (example below). Are you able to either use one language exclusively, or to code your floating point queries with small intervals rather than exact matches? $ echo '{_id: 1, v: 0.7}' | mongoimport -d server2950 -c test ) >>> exit() $ mongoexport --db server2950 --collection test exported 3 records |