[SERVER-57503] Investigate up-front view resolution for $lookup on sharded views Created: 07/Jun/21  Updated: 06/Jul/21  Resolved: 22/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Katherine Wu (Inactive) Assignee: Alya Berciu
Resolution: Done Votes: 0
Labels: read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-32548 Add $lookup support for sharded views Closed
Sprint: Query Optimization 2021-06-28
Participants:

 Description   

The current design for $lookup support for sharded views is to perform view resolution at query runtime, which is not ideal from a design standpoint and could result in suboptimal performance in the case of nested $lookups. Investigate the engineering effort of resolving the view up front instead.



 Comments   
Comment by Katherine Wu (Inactive) [ 06/Jul/21 ]

A slight addition to alya.berciu's comment above: if there was a nested $lookup with the 'from' field as a sharded view, doing the view resolution after targeting will result in slightly worse performance, since the top-level $lookup will rebuild the subpipeline for every input document, and so the nested $lookup will have to handle the CommandOnShardedViewNotSupportedOnMongod once every time the top-level $lookup subpipeline is run.

If we wanted to circumvent this we'd have to either traverse through the pipeline and subpipelines of every stage to look for a $lookup and resolve the 'from' namespace (which has its difficulties as Alya mentioned above), or edit the stages inside the top-level DocumentSourceLookup's `_resolvedPipeline` the first time a nested $lookup encounters a CommandOnShardedViewNotSupportedOnMongod error, which seems hacky and prone to other problems. Given that $unionWith is already handling a similar situation with the CommandOnShardedViewNotSupportedOnMongod approach, it seems like we should also follow this approach for the sharded $lookup case.

Comment by Alya Berciu [ 22/Jun/21 ]

I took a look at whether it would be possible to do the view resolution before splitting the pipelines as katherine.wu suggested. I don't think we should try to do this as part of the sharded lookup project.

First, it is worth noting that $unionWith already handles this as described in the design document, where we just handle a 
CommandOnShardedViewNotSupportedOnMongod exception in situations where we need to obtain a sharded view definition, and then reconstruct a pipeline based on the view definition enclosed in the exception itself:
https://github.com/mongodb/mongo/blob/3befdc7d70fa56085bbdc9606da0db84b5b48ccd/src/mongo/db/pipeline/document_source_union_with.cpp#L194-L206

The problem that I ran into was that we don't currently have a good way to resolve views using the primary shard's view catalog directly in mongos.

Since we are not on a mongod, we can't access the primary's view catalog to resolve any views. We also can't generate the CommandOnShardedViewNotSupportedOnMongod exception before actually targeting the shards (the primary will always know the view definition). Doing this after targeting would result in the same overhead as handling the exception in the mongod of each shard that executes its portion of the pipeline.

We would need to implement a command (or something similar) to request the view definition from the primary shard's view catalog so we can update the pipeline with the view definitions before splitting it. However, as Katherine pointed out, this would be at best a temporary measure until global view catalogs get implemented.

Comment by Katherine Wu (Inactive) [ 07/Jun/21 ]

kyle.suarez the initial design for this was to handle the view resolution during the query's execution in the DocumentSourceLookUp's doGetNext() function; an "up front" alternative might be to resolve the view on mongos by querying the view catalog stored on the db's primary shard and updating the DocumentSourceLookUp accordingly, before sending it to be executed on the shards.

Comment by Kyle Suarez [ 07/Jun/21 ]

What do we mean by "up front" here if it's not at query runtime? Would it be at the time of view creation (or any catalog change)?

Generated at Thu Feb 08 05:42:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.