[SERVER-60461] Add sub-object compression support to BSONColumnBuilder Created: 05/Oct/21  Updated: 29/Oct/23  Resolved: 20/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0

Type: Task Priority: Major - P3
Reporter: Henrik Edin Assignee: Henrik Edin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2021-10-18, Execution Team 2021-11-01
Participants:

 Comments   
Comment by Githook User [ 20/Oct/21 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: SERVER-60461 Compression support for sub-objects in BSON Column (type 7)

BSONColumnBuilder refactored to put compression logic in EncodingState class to allow for multiple encoding states at once in the case for sub-object compression where each scalar sub-field is compressed individually.

The compression of Objects is done in two phases:
1. Determine optimal reference-object
2. Compression of objects using reference-object

To determine reference-object we need to traverse incoming Objects that we need to compress that they match the BSON hierarchy of the reference-object determined so far. To match fields must be in the same order and not differ on being of Object type or an empty object. Fields may be missing in the object to compress vs the reference object.

When a compatible change is detected a new reference object is built by merging in the change. If an incompatible change is detected we end previous sub-object compression and start over.

The control bytes in the BSON Column binary are interleaved and belong to separate EncodingState's for separate sub-field streams. They are written in the order they are needed for decompression. When a DecodingState has depleted its current control block for values it reads the next interleaved block from the binary.
Branch: master
https://github.com/mongodb/mongo/commit/e3debd38ca358a62e17a4cd7031a40f4a280089f

Generated at Thu Feb 08 05:49:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.