[GODRIVER-1616] NaN values in MongoDB Created: 12/May/20 Updated: 27/Oct/23 Resolved: 18/May/20 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | CRUD |
| Affects Version/s: | 1.3.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Hodel | Assignee: | Unassigned |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux 4.9.0, mongodb 3.6.14 |
||
| Description |
|
2020/05/12 23:09:12 Error with mongodb://admins.find(): error decoding key osbuilddate(field OsBuildDate): IntDecodeValue can only truncate float64 to an integer type when truncation is enabled
The trouble is that some data put into mongodb from NodeJS sources, particularly when using parseInt or parseFloat before insertion, is recorded in the db as NaN. Therefore, when the go driver tries to pull that data out, all find() requests are halted if even a single document has a value of NaN when using Find() to write to a go struct that uses an int64 field. It seems to me I could rewrite the javascript stuff and just put null or nothing in the event of having a NaN value, however NaN isn't exactly nothing. I cannot understand the reason that the go driver would fail completely in this situation. Please help. |
| Comments |
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
URLs in error messages. Thank You, | ||||||||
| Comment by Divjot Arora (Inactive) [ 13/May/20 ] | ||||||||
|
I reached out to the Node driver team. They provided this example:
The buffer outputted by serialize shows that the NaN value was stored as a BSON double (the first 4 bytes are the length and the 5th byte, 0x01, is the BSON data type for the first element, which corresponds to the Double type). You can also see this by printing that buffer as a bson.Raw in Go, which yields {"a": {"$numberDouble":"NaN"}}. So I think Node and Go are both working as expected. NaN values are stored as double. Go reads the double as a float64, realizes that it can't be converted to an int without loss of precision, and errors. Even if we had a special-case for NaN in the code that decodes BSON double values as ints, any decision we make would be "wrong" because, so we feel it's best to keep the error behavior. | ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
Feel free to close this issue, it doesn't seem like I can and it will be a nice record for the search engines to index when other people question this activity. | ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
Still would be nice to know:
And how the NodeJS driver maps Number data types to BSON_DATA structures. Something like a simple truth table for that, just like you find with common logic gates.
Unrelated though, there just isn't room to store NaN if you efficiently store an integer. | ||||||||
| Comment by Divjot Arora (Inactive) [ 13/May/20 ] | ||||||||
|
The Go driver can also store values into the database as NaN, but these have to be float64 values in Go for which math.IsNaN returns true (e.g. the value returned by math.NaN()). JavaScript's type system is very different than Go's, and the parseInt() docs say that the function returns either an integer or NaN. This isn't really a thing in Go. strconv.ParseInt will either return an int64 or will error. It can't return NaN because doing so would require returning a float64 value. Also, this is mixing encoding and decoding. The error from the ticket description happens when decoding, and in my opinion, makes sense. If the value in the database is a IEEE 754 NaN, we can't reliably convert it to an integer without losing precision as there's no NaN integer value. Doing a naive cast like int64(float64FromDatabase) gives math.MinInt64, which probably isn't what you want. Other languages show this behavior as well. For example, see https://stackoverflow.com/questions/10366485/problems-casting-nan-floats-to-int for a C example. | ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
And how the NodeJS driver maps Number data types to BSON_DATA structures. Something like a simple truth table for that, just like you find with common logic gates. | ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
It certainly allows you to store NaN values from a number, maybe internally it is using a float as the Number type is the same for floats and integers. Regardless of JSON or BSON, BSON is being used internally.
In order to make MongoDB JSON-first, but still high-performance and general-purpose, BSON was invented to bridge the gap: a binary representation to store data in JSON format, optimized for speed, space, and flexibility. It’s not dissimilar from other interchange formats like protocol buffers, or thrift, in terms of approach. I guess that goes to show that they left out the explanation of using BSON integers for performance reasons in most places, how silly would that become...? I suppose we really need to find out if BSON_DATA_INT supports NaN values. | ||||||||
| Comment by Divjot Arora (Inactive) [ 13/May/20 ] | ||||||||
|
I don't know about the Node driver behavior offhand, but I can reach out to the team tomorrow to get more details about the behavior. Using "@" does bring up a dropdown menu of names for me so I'm not sure? It's possible that it only works for internal accounts. Not a big deal, though. Once I've commented on a ticket, I receive notifications for all new comments automatically. | ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
https://mongodb.github.io/node-mongodb-native/api-bson-generated/bson.html
| ||||||||
| Comment by Andrew Hodel [ 13/May/20 ] | ||||||||
|
How does the NodeJS MongoDB driver manage to store values from parseInt() as NaN values in MongoDB then? Also, how did you link my name, @Divjot Arora doesn't bring up a tooltip. | ||||||||
| Comment by Divjot Arora (Inactive) [ 13/May/20 ] | ||||||||
|
Update: I have verified that casting a NaN float64 value to int64 gives math.MinInt64. | ||||||||
| Comment by Divjot Arora (Inactive) [ 13/May/20 ] | ||||||||
|
BSON does not have a data type corresponding to NaN. It does have "Double", which corresponds to a Go float64. My understanding is that NaN is represented in the database as a float64 value in which the bits are laid out to represent NaN in IEEE 754 format. Given this, the driver is correctly saying that it read a float64 value. The other issue is truncation. By default, the driver will only allow float values to be decoded into integer types if doing so would not cause a loss of precision. You can opt-in to truncation even if it will cause loss of precision by using the truncate struct tag (see second code section of https://pkg.go.dev/go.mongodb.org/mongo-driver/bson?tab=doc#hdr-Structs):
NaN is a special float64 value, though, so I'm not sure this is a good idea. You'll likely end up with a very large or very small integer that won't make sense. I can investigate a bit into the behavior tomorrow to give you more specific details. Either way, I'd recommend sanitizing the data so you don't try to cast NaN values to integers. Hope this gives you a sense of what the driver is doing. Let me know if I can clarify anything. – Divjot | ||||||||
| Comment by Andrew Hodel [ 12/May/20 ] | ||||||||
|
It's the same as always, thousands of hosts sending tons of data and not worth using floats. Let me know something, I'll likely just change it in JS. | ||||||||
| Comment by Andrew Hodel [ 12/May/20 ] | ||||||||
|
Also, the error message incorrectly describes the field as float64 when it is actually NaN. --edit well it's float64 in go, anyhow you get the idea... I guess this is really why doesn't go itself have NaN within int and int64...??? I understand that go doesn't have a Number type like javascript with NaN, but https://golang.org/pkg/math/#IsNaN exists which is probably why it is doing this. The trouble is that in JS I can have an integer value that is NaN, assuming invalid input (like a string or something) to parseInt(). |