[SERVER-2950] floating point representation varies from one way to insert to another ? Created: 14/Apr/11  Updated: 12/Jul/16  Resolved: 09/Jun/11

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 1.8.0
Fix Version/s: 1.9.1

Type: Question Priority: Major - P3
Reporter: olivier barbecot Assignee: Aaron Staple
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu 10.10 64 bits


Issue Links:
Related
Participants:

 Description   

Hi,

i'm facing a problem while trying to store geolocalised points and key / values attached to it. The problem i'm encountering is that i need to check if a given point (identified by a couple lat / long) is already existing in the database or not.
Depending on the way i'm inserting my geolocalised points (either from pymongo, javascript shell or mongoimport), i get different results.

For instance, i try to store value 0.7 as a float using each of the 3 ways.

The results i get are the following :

{ "_id" : ObjectId("4da6e294aea05d721a4a87b9"), "value_from_mongoimport" : 0.7000000000000001 } { "_id" : ObjectId("4da6e2a3e36333098e18d40c"), "value_from shell" : 0.7 } { "_id" : ObjectId("4da6e2aa11f4ac1bc0000000"), "value_from_pymongo" : 0.7 }

I'm precisely using the "mongoimport way" to store data as it offers better performances than other ways.

The problem is, at a later time, when i do, from python, somethink like :
db.testfind_one(

{"mykey" : 0.7}

), i get none ......

I understand the problem of floating point representation, but why is it only affecting mongoimport ? is this something that implies the json to bson transformation ?

Help appreciated.

Regards.



 Comments   
Comment by auto [ 09/Jun/11 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-2950 use strtod to parse real numbers from json strings
Branch: master
https://github.com/mongodb/mongo/commit/19e4da397533e2b7bbd1c217d70a98a72093738f

Comment by Dave Steinberg [ 03/Jun/11 ]

The issue I see here isn't so much the difference between how python and mongo handle floats, it's the difference between what the shell does and what mongoimport does. Mongo ought to be internally consistent, and from there, the Python/Ruby/etc guys can build their drivers appropriately.

Take this simple document:

{"_id":"a","lat":31.437557,"long":-83.516034}

Calling insert in the shell yields an identical representation. Stuffing that json string a text file and loading it via mongoimport produces:

{ "_id" : "a", "lat" : 31.437556999999998, "long" : -83.516034 }

Clearly there's a difference in the way those numbers are being handled, and it's not consistent across the toolset.

Comment by Remon van Vliet [ 04/May/11 ]

Not sure if I agree with that last statement. I agree that equality checks should work if the floating point values are binary equals, and it does. However, mongo has no control over how a specific language/cpu/compiler will convert your "0.7" string to a IEEE 754 floating point value. Many differences are usually the result of rounding strategies (turning 0.10000000149011 back to 0.1 for example). The most obvious problem with this is that something like a 0.7 / 20000.0 == 0.000035 check will actually return false.

It's a good software development rule to never to equality checks on floating point numbers. Using small ranges is actually the standard practice when comparing floating point values where the range is usually referred to as epsilon. In other words, your equality check should be (pseudo) : equal = abs(value1 - value2) < epsilon.

Now, a case can be made to make mongo do this rather than you app but i think that might make floating point equality tests quite a bit slower. That said I'm not sure the binary compare is in any way useful in most apps for the floating point case specifically for stability/accuracy reasons mentioned above.

Comment by olivier barbecot [ 26/Apr/11 ]

Unfortunately, i am not able to use 1 language exclusively.

Using floating point queries with small intervals rather than exact matches might be a solution, you're right, but i'm a bit reluctant with this approach.

Another solution i found is to use a geohash as the _id.
This workaround solved my problem.

I understand different languages parse and print floats differently. Nevertheless languages used in mongodb should all act the same.

regards.

Comment by Richard Kreuter (Inactive) [ 19/Apr/11 ]

Different languages parse and print floats differently. The parser we use in mongoimport seems to differ from the ones used in JavaScript or Python, but none of them is exact, and no two of them agree about both parsing and printing (example below).

Are you able to either use one language exclusively, or to code your floating point queries with small intervals rather than exact matches?

$ echo '{_id: 1, v: 0.7}' | mongoimport -d server2950 -c test
connected to: 127.0.0.1
imported 1 objects
$ mongo --eval 'db.test.save({_id:2, v:0.7})' server2950
MongoDB shell version: 1.8.0
connecting to: server2950
$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c=pymongo.Collection()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'Collection'
>>> c=pymongo.Connection()
>>> db=c["server2950"]
>>> db.test.save(

{"_id":3, "v":0.7}

)
3
>>> for doc in db.test.find():
... print doc
...

{u'_id': 1, u'v': 0.70000000000000007} {u'_id': 2.0, u'v': 0.69999999999999996} {u'_id': 3, u'v': 0.69999999999999996}

>>> exit()
$ mongo --eval 'db.test.find()' server2950
MongoDB shell version: 1.8.0
connecting to: server2950
DBQuery: server2950.test -> undefined
$ mongo --eval 'db.test.find().forEach(printjson)' server2950
MongoDB shell version: 1.8.0
connecting to: server2950

{ "_id" : 1, "v" : 0.7000000000000001 } { "_id" : 2, "v" : 0.7 } { "_id" : 3, "v" : 0.7 }

$ mongoexport --db server2950 --collection test
connected to: 127.0.0.1

{ "_id" : 1, "v" : 0.7000000000000001 } { "_id" : 2, "v" : 0.7 } { "_id" : 3, "v" : 0.7 }

exported 3 records

Generated at Thu Feb 08 03:01:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.