[DOCS-9251] Documentation for BI-368: mongodrdl should be able to generate tables for collections with arrays that are not "pre-joined" Created: 31/Oct/16  Updated: 22/Nov/16  Resolved: 22/Nov/16

Status: Closed
Project: Documentation
Component/s: BI Connector
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Critical - P2
Reporter: Emily Hall Assignee: Andrew Aldridge
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Participants:
Days since reply: 7 years, 13 weeks ago
Epic Link: BI-223
Story Points: 0.25

 Description   
Documentation Changes

Include sample SQL query.

Description

Engineering Ticket Description:

Currently mongodrdl generates "pre-joined" tables for each field of type array in a collection. For users of relational models though, it would make more sense in many cases to not pre-join. So for a collection containing documents like:

{_id : 1, name : "jeff", tags : ["dog", "cat"}] }

It could generate DRDL like this instead:

schema:
- db: test
  tables:
  - table: users
    collection: users
    pipeline: []
    columns:
    - Name: _id
      MongoType: float64
      SqlName: _id
      SqlType: numeric
    - Name: name
      MongoType: string
      SqlName: name
      SqlType: varchar
  - table: users_tags
    collection: users
    pipeline:
    - $unwind:
        includeArrayIndex: tags_idx
        path: $tags
    columns:
    - Name: _id
      MongoType: float64
      SqlName: _id
      SqlType: numeric
    - Name: tags
      MongoType: string
      SqlName: tags
      SqlType: varchar
    - Name: tags_idx
      MongoType: int
      SqlName: tags_idx
      SqlType: numeric

Note that the users_tags table contains the _id field for joining on, but not the name field. With this structure, users would just join these two tables:

select u.*, t.tags from users_tags t join users u on t._id = u._id where tags = 'dog'

This would also make it easier to write queries where more than two tables from the same collection need to be joined together.

For users who are either relying on the current behavior in 1.x and are upgrading to 2.x, mongodrdl will provide a --preJoin option that preserves the current behavior.



 Comments   
Comment by Jeffrey Yemin [ 16/Nov/16 ]

Already doc'd here: https://docs-mongodbcom-staging.corp.mongodb.com/bi-connector/andrew/sqlproxy/schema-configuration.html#pre-joining

Comment by Jeffrey Yemin [ 09/Nov/16 ]

This one is really important to document for the release, so bumped the priority

Generated at Thu Feb 08 07:57:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.