-
Type:
Epic
-
Resolution: Won't Do
-
Priority:
Major - P3
-
None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
Not Needed
-
None
-
None
-
None
-
None
-
None
-
None
This will have impact on ET teams, particularly BIC and Charts.
We are currently writing the PD to determine the impact for this project and whether the pain points are shared by Cloud teams as well.
Description of Linked Ticket
Summary
Several tools including compass, schema advisor, and the Business Intelligence Connector all rely on using $sample to summarize the contents of a collection. $sample can be tricky to use in a performant way. We know of several workarounds implemented by these tools. Instead of repeatedly scanning the collection on-demand, we could maintain a certain sized sample or perhaps a summary schema as the contents of the collection change.
Motivation
Having an accurate view of a MongoDB collection's schema is critical to the correct operation of the BI Connector (and now the $sql stage in Atlas Data Lake as well).
Right now, schemas either have to be provided manually by a user, or we infer them by sampling, which can often fail to capture the entire schema
The SQL engines use the schema when translating SQL to MQL. When we have a schema that doesn't accurately reflect the data in the collection, the assumptions made during translation may be violated, which might cause the query to return runtime errors or incorrect results. Schema management has consistently been the #1 pain point for BI Connector users. This would help improve the overall user experience for our SQL products.
Cast of Characters
- Product Owner:
- Project Lead:
- Program Manager:
- Drivers Contact:
Documentation
Scope Document
Technical Design Document
Product Description