Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-775

Ability to specify union

    XMLWordPrintable

Details

    • Task
    • Status: Closed
    • Major - P3
    • Resolution: Duplicate
    • None
    • None

    Description

      Downstream Change Summary

      TBD

      Description of Linked Ticket

      Epic Summary

      Summary

      We will implement a new Agg stage, $union, that allows to merge results of n pipelines preserving duplicates. In order to enable merging data from multiple collections, we will also introduce an explicit stage to reference a collection, $collection.

      Motivation

      Union is a fundamental operation in relational algebra. We have several specific scenarios:

      • BIC connector for completeness with SQL.
      • TimeSeries scenario to combine data stored in per-period collections into one logical collection.
      • Combining collections in Data Lake, e.g. archival and recent data, data from different regions.

      For analytical scenarios, customers expect a complete set of fundamental operations. For example for Tableau, union and unpivot were top requested features after joins. In the future, we will be improving $lookup but delivering general and performant joins is a hard task. At the same time, union-like logic is already supported for operations that require merging results across shards in the backend.

      Documentation

      Scope Document
      Design Document

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              backlog-server-pm Backlog - Core Eng Program Management Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: