Major - P3
(copied to CRM)
Materialized views would allow developers that use MongoDB to move data denormalization from the application layer to the database layer. While the process of materializing the view will require additional writes and slow the effective throughput of the database per write request, if used properly it should be a net wash in terms of real world performance as applications will not be sending additional requests to write the same denormalized data.
A simple expansion of the command for creating a view. A new materialized: <bool> option which defaults to false would allow the view to be created as a materialized view:
- Materialized views exist on disk but cannot be written to by the user. Internally a materialized view is a normal collection that is maintained by MongoDB using a defined pipeline rather than maintained by writes from an application.
- Updates to the materialized view are not atomic. In other words, queries against the collection(s) that back(s) a materialized view may return results that are inconsistent with the results of the materialized view.
- Materialized views do not use the indexes or sharding of the underlying collection(s)
- Materialized views can be indexed and/or sharded like a normal collection
- The aggregation pipline for a materialized view must emit an _id
- Materialized views write the result of the pipeline as an upsert against _id
- When updating a document in the underlying collection the pipeline will only be run against the document being updated. To ensure correct results materialized views should only use operators that can properly work against a single document. Materialized views typically should not use pipeline stages that operate against a set of documents such as:
- The database should not enforce any limitation on the above commands because they may still be useful in connection with other stages such as $unwind.
- Materialized views may use $lookup, however, updates against the joined collection will not update associated documents in the materialized view. Applications may force an update of associated materialized documents by forcing an update of the appropriate backing documents in the collection that the materialized view is based on.
- It seems that the ability to update the definition of a materialized view would be extremely desirable, however, this could also be a monumentally expensive operation during which data on the collection would be in an inconsistent state. Ensuring consistency doubles the space requirements. If updates to the definition are allowed, it might make sense to allow the user to choose between consistency or an in-place strategy (inPlace: <bool>, default false).
- I'm not sure how to handle the $match stage, especially in the case where a record matched, and so was materialized, and later does not match, and should be removed. The simplistic implementation could simply dictate that match is bad, and you should emit an empty record instead, but that seems messy. It may be that materialized views require an extra internal collection of some sort to store the relationships between the input document(s) and the output record(s). Such a collection might conceivably also be useful for allowing a strong materialization of records pulled in from $lookup.