[DOCS-15103] [SERVER] Add documentation for BSONColumn (Binary subtype 7) to bsonspec.org Created: 10/Feb/22  Updated: 01/Feb/24

Status: Backlog
Project: Documentation
Component/s: drivers, manual
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Geert Bosch Assignee: Rea Rustagi
Resolution: Unresolved Votes: 0
Labels: backlog, feature, release
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Participants:
Days since reply: 6 days ago
Epic Link: DOCSP-11702

 Description   

Summary

Document the contents of the BSONColumn (binary subtype 7) format.

Motivation

Affected users are application developers and advanced users directly accessing system.buckets.X collections containing time-series data.

By documenting our format in a public place, we encourage new innovative uses of this format, including the development of efficient third-party tools to import and export time-series data.

How does this affect the end user?

The initial loading of data into MongoDB is currently very slow, as it has to be done measurement by measurement through the command interface. With a tool that converts, for example,{{ .csv}} files directly to time-series buckets using the BSONColumn format, we could significantly speed up import of large amounts of time-series data.

The user may feel more confident knowing their data is stored in a documented format that is part of the BSON specification. Even without tools to directly read BSONColumn data, just the fact that the format is documented may alleviate discomfort with not being able to directly "see" how the information is stored.

How likely is it that this problem or use case will occur?

This documentation will be useful for MongoDB internally (for Technical Support, for example). It is somewhat likely that a user will look at their system.buckets collections for their time-series data and be concerned that their data is stored in some opaque format that no standard tools know about. 

If the problem does occur, what are the consequences and how severe are they?

**If users feel uncomfortable storing their data in MongoDB's time-series collections, that will limit our addressable market, or make feel customers uncomfortable with BSON as a format altogether. 

Is this issue urgent?

Urgency is debatable, but the MongoDB 6.0 GA version is going to be the first LTS release using BSON columnar compression by default, so I think that we should document the data format by then.

Is this ticket required by a downstream team?

I do not think this ticket is directly required by a downstream team, though it could present an opportunity for performance optimization or introspection and diagnostics around storage efficiency and compression ratios.

Is this ticket only for tests?

This ticket is important for production.



 Comments   
Comment by Henrik Edin [ 01/Feb/24 ]

Exciting! I made a first draft that I have not been able to find time to finish. You can find my draft PR here: https://github.com/10gen/mongo/pull/12458 (on a different location to avoid a public PR) and here for a view on how it looks: https://github.com/10gen/mongo/blob/henrikedin/WRITING-10967.bsoncolumn/docs/subtype7.rst

There are a minor thing that have changed in the format since I wrote that, please sync up with me!

Comment by Geert Bosch [ 01/Feb/24 ]

Hey henrik.edin@mongodb.com , Rea is working on this ticket but I believe you have a better start for thisthan what's in the design doc. Is that right? If so, could you share that?

Comment by Bernie Hackett [ 10/Feb/22 ]

I moved this to docs because I'm not sure who owns bsonspec.org, but it looks like a lot of people all over the company commit to the repo. https://github.com/mongodb/bsonspec.org

Generated at Thu Feb 08 08:12:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.