Documentation Changes
Include sample SQL query.
Description
Engineering Ticket Description:
Currently mongodrdl generates "pre-joined" tables for each field of type array in a collection. For users of relational models though, it would make more sense in many cases to not pre-join. So for a collection containing documents like:
{_id : 1, name : "jeff", tags : ["dog", "cat"}] }
|
It could generate DRDL like this instead:
schema:
|
- db: test
|
tables:
|
- table: users
|
collection: users
|
pipeline: []
|
columns:
|
- Name: _id
|
MongoType: float64
|
SqlName: _id
|
SqlType: numeric
|
- Name: name
|
MongoType: string
|
SqlName: name
|
SqlType: varchar
|
- table: users_tags
|
collection: users
|
pipeline:
|
- $unwind:
|
includeArrayIndex: tags_idx
|
path: $tags
|
columns:
|
- Name: _id
|
MongoType: float64
|
SqlName: _id
|
SqlType: numeric
|
- Name: tags
|
MongoType: string
|
SqlName: tags
|
SqlType: varchar
|
- Name: tags_idx
|
MongoType: int
|
SqlName: tags_idx
|
SqlType: numeric
|
Note that the users_tags table contains the _id field for joining on, but not the name field. With this structure, users would just join these two tables:
select u.*, t.tags from users_tags t join users u on t._id = u._id where tags = 'dog'
|
This would also make it easier to write queries where more than two tables from the same collection need to be joined together.
For users who are either relying on the current behavior in 1.x and are upgrading to 2.x, mongodrdl will provide a --preJoin option that preserves the current behavior.
|