[SERVER-9342] BSON storing 32 bit floats Created: 12/Apr/13 Updated: 09/Apr/20 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | features we're not sure of |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andrew Emil (Inactive) | Assignee: | DO NOT USE - Backlog - Platform Team |
| Resolution: | Unresolved | Votes: | 6 |
| Labels: | bson, move-sa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
Add the ability to store 32 bit float values rather than forcing all floating point values to be 64 bits. This would cut down on storage utilization for users who know that the values they are storing are only 32 bits. This includes adding a new type to the BSON specification and implementing support for this new type in the drivers. |
| Comments |
| Comment by Massimo Redaelli [ 01/Aug/18 ] | ||||
|
We would also be very happy to have this. Any news? | ||||
| Comment by Sam Bennett [ 30/Jun/14 ] | ||||
|
I believe this is linked to this PHP-1134. This has now become a major issue. | ||||
| Comment by sachi [ 11/Dec/13 ] | ||||
|
Same here. 90% of our fields contain numbers, and we want only float precision. Given we gained ~70% performance boost by limiting field names to single characters, I am sure there will be quite a big performance improvement if we can just store floats. | ||||
| Comment by Kevin J. Rice [ 12/Apr/13 ] | ||||
|
Elaboration: We're storing numbers, lots of numbers. Specifically, we'd like to store an array of tightly packed pairs of (16-bit int, 32-bit float) numbers. We know the data type ahead of time for every element in the array. BTW, these are (timestamp, value) pairs that are measurements of system metrics on Sears.com systems. We're currently storing 1.5M metrics per minute in another system, and will, by the end of this year, be storing around 7 million metrics (ts,val pairs) per minute. This takes a lot of storage. Under BSON storage doctrines, we have a doc with:
Each ts is a double now, as is each val. Thus, bsonobjectsize is 30 bytes per additional pair of timestamp, value. We'd like to store data more compactly! We only need 6 bytes per pair: That's a 2-byte timestamp offset of 0-3600 seconds after 'starttime', plus a 4-byte/32-bit float measurement. We know the datatypes of this data ahead of time, and DON'T need to create an array that can have [ 'a', 12, 34.7, ...] arbitrary datatypes. Let's say you give us a nice BSON type that is a binary type. In pymongo, we'll use struct.pack() to create the string (with embedded nulls), or otherwise a binary object of length 6 bytes that is the measurement. Alternately, we could just create a 32-bit timestamp plus a 32-bit float, making an 8-byte binary 'thing' to store. Regardless, we'd need to do several operations on this field:
We don't need to add bson datatype overhead per measurement, we want an array of one datatype. I think this probably is a fairly common use-case for storing data in Mongo - storing binary values in an array. Further, the ability to store 32-bit values is probably pretty common, too, since I doubt most people will be interested in values with 64 bits of significant digits. As mentioned above, 32 bits give us 4 Billion ints, or a +-10**38th range for floats. |