[SERVER-65658] Remove unnecessary null byte from columnar storage format Created: 14/Apr/22 Updated: 24/Jan/23 Resolved: 24/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Backlog - Query Execution |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Sprint: | QE 2022-09-19 |
| Participants: |
| Description |
|
For certain values in the columnar index, we store a BSONElement with an empty field name directly. The overhead of this empty field name is one byte (for the null terminator). There's no reason for us to store this null byte once per value, and for the case where many values are numeric/scalar, the space savings could be 10-20%. |
| Comments |
| Comment by Ian Boros [ 24/Jan/23 ] |
|
We decided the improvement here would not be worth it. |
| Comment by Charlie Swanson [ 01/Sep/22 ] |
|
I believe this byte is coming from here on the encoding side: https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/index/column_cell.cpp#L164-L168. I think the decoding logic is this part: https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/storage/column_store.h#L500-L507. Unfortunately, the size of the index is not yet implemented yet (see https://github.com/mongodb/mongo/blob/04ea05b120dfbecccb01cf1d1363f25d9164abaf/src/mongo/db/storage/wiredtiger/wiredtiger_column_store.cpp#L428-L440), so it may be hard to quantify the impact of this. We could consider reviving the code that was there in the POC branch and resolving that TODO as part of this ticket: https://github.com/10gen/mongo/blob/fc24ae382632fd385c325ce72c01376b70d3d966/src/mongo/db/storage/wiredtiger/wiredtiger_column_store.cpp#L551-L597 I'm not sure if the code will have to change at all or much from what was in the POC branch. It looks pretty reasonable to me at first glance, but someone from the server storage execution team may know if that code is appropriate. |