[SERVER-8716] Various update() operators for Binary Data Created: 25/Feb/13  Updated: 06/Apr/23

Status: Backlog
Project: Core Server
Component/s: Write Ops
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Richard Kreuter (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-6399 Refactor update() code Closed
Related
related to SERVER-4362 Add XOR operator to $bit Closed
related to SERVER-9380 Add single-type arrays Open
related to SERVER-3281 Support $bit operator for binary types Backlog
is related to SERVER-55386 Support hex conversion and binary ope... Backlog
Assigned Teams:
Query Execution
Participants:
Case:

 Description   

There are a few update operations that it might be nice to offer on binary data objects in documents:

  • append/concatenate (maybe this could be an overload to $push?)
  • slice/subsequence
  • replace


 Comments   
Comment by Craig Leyshan [ 31/Jul/14 ]

I have a use case that would benefit greatly by using a binary (byte array) and being able to manipulate individual bytes in updates atomically: Hyperloglog. In this case, the size of the array is fixed, and updates need to be able to make use of the $max operator on individual bytes. Right now it is very space inefficient to put these in mongo in a way that allows safe parallel updates (using a so-called "array" of Integers, i'm look at this thing taking up about 7 bytes for each byte of real information). I think one way this could work is for 'byte' to be a valid type in BSON, and for arrays in BSON to be encoded as an element type, length and then a dense packing of the elements. Lastly, the update semantics in mongo would need to support arrays better. rather than the existing "key.n" syntax, a "key[n]" syntax (or similar) could be adopted to be able to address individual positions in the array. For example in upserts, with the current syntax, mongo thinks you want a document, not an array, if an insert is required.

Comment by Kevin J. Rice [ 20/Aug/13 ]

To be very specific, we can significantly reduce our data footprint (and thus increase performance) by replacing our data model of:

{ ...  'vals' : [   [1,2], [3,4], [5,6], ... ],...}

with one where we have vals packed in a bit field, and we just append bits to the front/end, and remove bits from the front/end. Our immediate need is appending to one end, and removing from the other, to keep a specific maximum length of time-series data in the bitfield. Appending and removing from the same end is nearly useless in our use case, as we need a queue, not a stack. Though, I can readily see how some people would need a stack.

{ ...  'vals' : BinaryObject ...}

Due to the way that MongoDB's BSON format has to encode the values as [ [1,2], [2,3] ], each bracket and comma in this is functionally encoded to a byte, consuming (for floats) 30 bytes per numeric pair. With this bitfield packing, the size per point would reduce to even down to 8 bytes per numeric pair (2 x 32-bit floats). This is a HUGE savings in size consumed per datapoint, and would thus mean we could keep more in memory at once for faster data access and faster updating as well given it's more likely to be in memory.

I would wager that this is a common need among many MongoDB users - the ability to pack data more tightly to optimize (reduce) data storage sizes.

These bit-field operators should be fairly low level operators, and thus (on the surface) would seem to be fairly easy to implement.

Any other use cases out there?

Comment by Kevin J. Rice [ 03/Apr/13 ]

This would allow much tighter packing of data for us. instead of $push onto

{ ... 'arrayName': [ [1,2], [3,4], ...}

I could use python's struct to binary-encode 5 and 6 into a 2-byte number (or in our case, two doubles), bson.Binary() -encode them, then append them to an existing

{ ... 'arrayName' : BinaryObj(...) }

object. This would change from 30 bytes per datapoint pair to 16 bytes per datapoint pair.

I'm presuming there's BSON project support for append binary data to binary object. Likewise, we'd need to be able to resize (remove data from the front or end) of binary data.

Generated at Thu Feb 08 03:18:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.