[SERVER-60461] Add sub-object compression support to BSONColumnBuilder Created: 05/Oct/21 Updated: 29/Oct/23 Resolved: 20/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Henrik Edin | Assignee: | Henrik Edin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Execution Team 2021-10-18, Execution Team 2021-11-01 |
| Participants: |
| Comments |
| Comment by Githook User [ 20/Oct/21 ] |
|
Author: {'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}Message: BSONColumnBuilder refactored to put compression logic in EncodingState class to allow for multiple encoding states at once in the case for sub-object compression where each scalar sub-field is compressed individually. The compression of Objects is done in two phases: To determine reference-object we need to traverse incoming Objects that we need to compress that they match the BSON hierarchy of the reference-object determined so far. To match fields must be in the same order and not differ on being of Object type or an empty object. Fields may be missing in the object to compress vs the reference object. When a compatible change is detected a new reference object is built by merging in the change. If an incompatible change is detected we end previous sub-object compression and start over. The control bytes in the BSON Column binary are interleaved and belong to separate EncodingState's for separate sub-field streams. They are written in the order they are needed for decompression. When a DecodingState has depleted its current control block for values it reads the next interleaved block from the binary. |