[SERVER-9342] BSON storing 32 bit floats Created: 12/Apr/13  Updated: 09/Apr/20

Status: Open
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: Improvement Priority: Major - P3
Reporter: Andrew Emil (Inactive) Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Unresolved Votes: 6
Labels: bson, move-sa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-9380 Add single-type arrays Open
Participants:
Case:

 Description   

Add the ability to store 32 bit float values rather than forcing all floating point values to be 64 bits. This would cut down on storage utilization for users who know that the values they are storing are only 32 bits. This includes adding a new type to the BSON specification and implementing support for this new type in the drivers.



 Comments   
Comment by Massimo Redaelli [ 01/Aug/18 ]

We would also be very happy to have this. Any news?

Comment by Sam Bennett [ 30/Jun/14 ]

I believe this is linked to this PHP-1134. This has now become a major issue.

Comment by sachi [ 11/Dec/13 ]

Same here. 90% of our fields contain numbers, and we want only float precision. Given we gained ~70% performance boost by limiting field names to single characters, I am sure there will be quite a big performance improvement if we can just store floats.

Comment by Kevin J. Rice [ 12/Apr/13 ]

Elaboration:

We're storing numbers, lots of numbers. Specifically, we'd like to store an array of tightly packed pairs of (16-bit int, 32-bit float) numbers. We know the data type ahead of time for every element in the array.

BTW, these are (timestamp, value) pairs that are measurements of system metrics on Sears.com systems. We're currently storing 1.5M metrics per minute in another system, and will, by the end of this year, be storing around 7 million metrics (ts,val pairs) per minute.

This takes a lot of storage. Under BSON storage doctrines, we have a doc with:

{ ...
  'starttime' : ts0,
  'vals' : [   [ts1, val1], [ts2, val2], ...]
}

Each ts is a double now, as is each val. Thus, bsonobjectsize is 30 bytes per additional pair of timestamp, value.

We'd like to store data more compactly! We only need 6 bytes per pair: That's a 2-byte timestamp offset of 0-3600 seconds after 'starttime', plus a 4-byte/32-bit float measurement. We know the datatypes of this data ahead of time, and DON'T need to create an array that can have [ 'a', 12, 34.7, ...] arbitrary datatypes.

Let's say you give us a nice BSON type that is a binary type. In pymongo, we'll use struct.pack() to create the string (with embedded nulls), or otherwise a binary object of length 6 bytes that is the measurement. Alternately, we could just create a 32-bit timestamp plus a 32-bit float, making an 8-byte binary 'thing' to store. Regardless, we'd need to do several operations on this field:

  • append one measurement to the right;
  • remove (shorten) the field by n elements (more than one!) from the left (removing old data);

We don't need to add bson datatype overhead per measurement, we want an array of one datatype.

I think this probably is a fairly common use-case for storing data in Mongo - storing binary values in an array. Further, the ability to store 32-bit values is probably pretty common, too, since I doubt most people will be interested in values with 64 bits of significant digits. As mentioned above, 32 bits give us 4 Billion ints, or a +-10**38th range for floats.

Generated at Thu Feb 08 03:20:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.