Investigate changes in PM-2135: Maintain sample or schema summary

    • Type: Epic
    • Resolution: Won't Do
    • Priority: Major - P3
    • None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • Not Needed
    • None
    • None
    • None
    • None
    • None
    • None

      Downstream Change Summary

      This will have impact on ET teams, particularly BIC and Charts.
      We are currently writing the PD to determine the impact for this project and whether the pain points are shared by Cloud teams as well.

      Description of Linked Ticket

      Epic Summary

      Summary

      Several tools including compass, schema advisor, and the Business Intelligence Connector all rely on using $sample to summarize the contents of a collection. $sample can be tricky to use in a performant way. We know of several workarounds implemented by these tools. Instead of repeatedly scanning the collection on-demand, we could maintain a certain sized sample or perhaps a summary schema as the contents of the collection change.

      Motivation

      Having an accurate view of a MongoDB collection's schema is critical to the correct operation of the BI Connector (and now the $sql stage in Atlas Data Lake as well).
      Right now, schemas either have to be provided manually by a user, or we infer them by sampling, which can often fail to capture the entire schema
      The SQL engines use the schema when translating SQL to MQL. When we have a schema that doesn't accurately reflect the data in the collection, the assumptions made during translation may be violated, which might cause the query to return runtime errors or incorrect results. Schema management has consistently been the #1 pain point for BI Connector users. This would help improve the overall user experience for our SQL products.

      Cast of Characters

      • Product Owner:
      • Project Lead:
      • Program Manager:
      • Drivers Contact:

      Documentation

      Scope Document
      Technical Design Document
      Product Description

            Assignee:
            Unassigned
            Reporter:
            Backlog - Core Eng Program Management Team
            None
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              None
              None
              None
              None
              None
              None