Uploaded image for project: 'Rust Driver'
  1. Rust Driver
  2. RUST-528

Enumerated `Binary` type serialization stores redundant data that can be trimmed.

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Community Answered
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I currently have the current code for inserting a 256-bit password hash into a password collection (`paswd_db.storage`)

      ```rust
      use mongodb::bson::

      { doc, Binary, spec}

      ;
      let foo = pswd_db
      .storage
      .insert_one(
      doc!{
      "email": auth_data.email,
      "hashed": Binary

      { subtype: spec::BinarySubtype::Generic, bytes: to_store.0.to_vec()}

      },
      None
      ).await;
      println!("{:?}", foo);

      ```
      This performs a successful insertion, with the document looking like this:

      ```text
      {"_id":

      {"$oid":"5f267ee4003bb11b0031b809"}

      ,"email":"fizzbuzz.com","hashed":{"$binary":"XMkoWpYRAurzZu0et6bAOir8mcVDGrZsbdH74RbAT/8=","$type":"0"}}
      ```

      Note: the `hashed` field encodes both the binary and the binary sub-type. This appears to be done by the `serde::Serialize` macro.

      The redundant data in here is the information that it is a `Binary` and the integer representing that it's the `Binary::Generic` variant. Given that `Binary` has 8 variants, this is a theoretical 3 bits minimum overhead per entry (though in practice, is likely 8).

      The following comes with the context that I don't know the internals of mongo, and is speculation.

      With the assumption that the `$binary` component of the document is encoded internally as an integer, that would imply that the 3 bits of information encoding which `Binary::<variant>` is being stored can be folded into the value encoding the `$binary` part.

      There are two theory caveats to this implication that come to mind (I'm sure there's a lot more complexity going on): the count of the set of possible values that encode where `$binary` is encoded is less than `2^$bits - 7` (i.e. it can accomodate the 8 new values that would replace the `$binary` value). Second: If the first doesn't apply, adding an additional bit where `$binary` is currently encoded is a suitable cost for removing what currently encodes which variant of `$binary` is at play.

        Attachments

          Activity

            People

            Assignee:
            sam.rossi Samuel Rossi (Inactive)
            Reporter:
            benphawke@gmail.com Ben Ph
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: