Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15103

[SERVER] Add documentation for BSONColumn (Binary subtype 7) to bsonspec.org

      Summary

      Document the contents of the BSONColumn (binary subtype 7) format.

      Motivation

      Affected users are application developers and advanced users directly accessing system.buckets.X collections containing time-series data.

      By documenting our format in a public place, we encourage new innovative uses of this format, including the development of efficient third-party tools to import and export time-series data.

      How does this affect the end user?

      The initial loading of data into MongoDB is currently very slow, as it has to be done measurement by measurement through the command interface. With a tool that converts, for example,{{ .csv}} files directly to time-series buckets using the BSONColumn format, we could significantly speed up import of large amounts of time-series data.

      The user may feel more confident knowing their data is stored in a documented format that is part of the BSON specification. Even without tools to directly read BSONColumn data, just the fact that the format is documented may alleviate discomfort with not being able to directly "see" how the information is stored.

      How likely is it that this problem or use case will occur?

      This documentation will be useful for MongoDB internally (for Technical Support, for example). It is somewhat likely that a user will look at their system.buckets collections for their time-series data and be concerned that their data is stored in some opaque format that no standard tools know about. 

      If the problem does occur, what are the consequences and how severe are they?

      **If users feel uncomfortable storing their data in MongoDB's time-series collections, that will limit our addressable market, or make feel customers uncomfortable with BSON as a format altogether. 

      Is this issue urgent?

      Urgency is debatable, but the MongoDB 6.0 GA version is going to be the first LTS release using BSON columnar compression by default, so I think that we should document the data format by then.

      Is this ticket required by a downstream team?

      I do not think this ticket is directly required by a downstream team, though it could present an opportunity for performance optimization or introspection and diagnostics around storage efficiency and compression ratios.

      Is this ticket only for tests?

      This ticket is important for production.

            Assignee:
            rea.rustagi@mongodb.com Rea Rustagi
            Reporter:
            geert.bosch@mongodb.com Geert Bosch
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved:
              6 weeks ago