[SERVER-65658] Remove unnecessary null byte from columnar storage format Created: 14/Apr/22  Updated: 24/Jan/23  Resolved: 24/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ian Boros Assignee: Backlog - Query Execution
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Sprint: QE 2022-09-19
Participants:

 Description   

For certain values in the columnar index, we store a BSONElement with an empty field name directly. The overhead of this empty field name is one byte (for the null terminator). There's no reason for us to store this null byte once per value, and for the case where many values are numeric/scalar, the space savings could be 10-20%.



 Comments   
Comment by Ian Boros [ 24/Jan/23 ]

We decided the improvement here would not be worth it.

Comment by Charlie Swanson [ 01/Sep/22 ]

I believe this byte is coming from here on the encoding side: https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/index/column_cell.cpp#L164-L168. I think the decoding logic is this part: https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/storage/column_store.h#L500-L507.

Unfortunately, the size of the index is not yet implemented yet (see https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/storage/wiredtiger/wiredtiger_column_store.cpp#L428-L440), so it may be hard to quantify the impact of this. We could consider reviving the code that was there in the POC branch and resolving that TODO as part of this ticket: https://github.com/10gen/mongo/blob/fc24ae382632fd385c325ce72c01376b70d3d966/src/mongo/db/storage/wiredtiger/wiredtiger_column_store.cpp#L551-L597

I'm not sure if the code will have to change at all or much from what was in the POC branch. It looks pretty reasonable to me at first glance, but someone from the server storage execution team may know if that code is appropriate.

Generated at Thu Feb 08 06:03:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.