Details

Type: Improvement

Status: Closed

Priority: Major  P3

Resolution: Community Answered

Affects Version/s: None

Fix Version/s: None

Component/s: None

Labels:None
Description
I currently have the current code for inserting a 256bit password hash into a password collection (`paswd_db.storage`)
```rust
use mongodb::bson::
;
let foo = pswd_db
.storage
.insert_one(
doc!{
"email": auth_data.email,
"hashed": Binary
},
None
).await;
println!("{:?}", foo);
```
This performs a successful insertion, with the document looking like this:
```text
{"_id":
,"email":"fizzbuzz.com","hashed":{"$binary":"XMkoWpYRAurzZu0et6bAOir8mcVDGrZsbdH74RbAT/8=","$type":"0"}}
```
Note: the `hashed` field encodes both the binary and the binary subtype. This appears to be done by the `serde::Serialize` macro.
The redundant data in here is the information that it is a `Binary` and the integer representing that it's the `Binary::Generic` variant. Given that `Binary` has 8 variants, this is a theoretical 3 bits minimum overhead per entry (though in practice, is likely 8).
The following comes with the context that I don't know the internals of mongo, and is speculation.
With the assumption that the `$binary` component of the document is encoded internally as an integer, that would imply that the 3 bits of information encoding which `Binary::<variant>` is being stored can be folded into the value encoding the `$binary` part.
There are two theory caveats to this implication that come to mind (I'm sure there's a lot more complexity going on): the count of the set of possible values that encode where `$binary` is encoded is less than `2^$bits  7` (i.e. it can accomodate the 8 new values that would replace the `$binary` value). Second: If the first doesn't apply, adding an additional bit where `$binary` is currently encoded is a suitable cost for removing what currently encodes which variant of `$binary` is at play.