[SERVER-27698] Materialized Views Created: 16/Jan/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: John Crenshaw Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 28
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-2150 Persistent Queries Closed
Related
is related to SERVER-10788 Writable views Backlog
Assigned Teams:
Query Execution
Participants:
Case:

 Description   

Materialized views would allow developers that use MongoDB to move data denormalization from the application layer to the database layer. While the process of materializing the view will require additional writes and slow the effective throughput of the database per write request, if used properly it should be a net wash in terms of real world performance as applications will not be sending additional requests to write the same denormalized data.

Proposed Command

A simple expansion of the command for creating a view. A new materialized: <bool> option which defaults to false would allow the view to be created as a materialized view:

db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline>, materialized: <bool> } )

Proposed Behavior

  • Materialized views exist on disk but cannot be written to by the user. Internally a materialized view is a normal collection that is maintained by MongoDB using a defined pipeline rather than maintained by writes from an application.
  • Updates to the materialized view are not atomic. In other words, queries against the collection(s) that back(s) a materialized view may return results that are inconsistent with the results of the materialized view.
  • Materialized views do not use the indexes or sharding of the underlying collection(s)
  • Materialized views can be indexed and/or sharded like a normal collection
  • The aggregation pipline for a materialized view must emit an _id
  • Materialized views write the result of the pipeline as an upsert against _id
  • When updating a document in the underlying collection the pipeline will only be run against the document being updated. To ensure correct results materialized views should only use operators that can properly work against a single document. Materialized views typically should not use pipeline stages that operate against a set of documents such as:
  • $limit
  • $skip
  • $group
  • $sample
  • $sort
  • $out
  • $bucket
  • $bucketAuto
  • $sortByCount
  • $count
  • The database should not enforce any limitation on the above commands because they may still be useful in connection with other stages such as $unwind.
  • Materialized views may use $lookup, however, updates against the joined collection will not update associated documents in the materialized view. Applications may force an update of associated materialized documents by forcing an update of the appropriate backing documents in the collection that the materialized view is based on.

Other Considerations

  • It seems that the ability to update the definition of a materialized view would be extremely desirable, however, this could also be a monumentally expensive operation during which data on the collection would be in an inconsistent state. Ensuring consistency doubles the space requirements. If updates to the definition are allowed, it might make sense to allow the user to choose between consistency or an in-place strategy (inPlace: <bool>, default false).
  • I'm not sure how to handle the $match stage, especially in the case where a record matched, and so was materialized, and later does not match, and should be removed. The simplistic implementation could simply dictate that match is bad, and you should emit an empty record instead, but that seems messy. It may be that materialized views require an extra internal collection of some sort to store the relationships between the input document(s) and the output record(s). Such a collection might conceivably also be useful for allowing a strong materialization of records pulled in from $lookup.


 Comments   
Comment by Daniel Pasette (Inactive) [ 18/Aug/20 ]

marcelnsquaredsoftware@gmail.com, I wonder if using a pattern of "on-demand materialized views" (which leverages the $merge functionality released in v4.2) would apply to your use case.

Comment by Marcel Kroll [ 18/Aug/20 ]

+1 to this. Not being able to do an SQL equivalent materialised view makes dealing with datasets spanning millions of records a real issue especially when we are dealing with down stream clients having to wait for complex aggregate queries to complete on large data sets (or even hit HTTP timeouts due to the query taking too long).

In cases where 100% accuracy isn't required being able to instantly query even old data would be preferable if the result returned near instantaneously.

 

M.

 

Comment by Ravi N [ 21/Dec/18 ]

Thank you.
If it was implemented it would be good and helps developers a lot.

Ravi Kumar Nagireddygari
Morgan Stanley | Corporate & Funding Technology
7-8/F, Campus 6A, RMZ Ecoworld, Sarjapur | Marathahalli Outer Ring Rd, Devarabisana Halli, Bengaluru East Taluk
Bengaluru, 560103
Phone: +91 80 6104-2021
Mobile: +91 7 7022-22170
Ravi.Kumar.Nagireddy@morganstanley.com

Be carbon conscious. Please consider our environment before printing this email.

Comment by Ramon Fernandez Marina [ 20/Dec/18 ]

ravi.kumar.nagireddy@morganstanley.com, this feature is in our backlog but unfortunately there's no concrete timeline for it at the moment. We'll post updates to this ticket as that changes in the future.

Thanks,
Ramón.

Comment by Ravi N [ 12/Dec/18 ]

Is there a timeline when this feature would be available?

Comment by Mark Agarunov [ 17/Jan/17 ]

Hi bugslayer,

Thank you for the detailed example. I've set the fixVersion to "Needs Triage" for this new feature to be scheduled against our currently planned work. Updates will be posted on this ticket as they happen.

Thanks,
Mark

Generated at Thu Feb 08 04:15:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.