[SERVER-57503] Investigate up-front view resolution for $lookup on sharded views Created: 07/Jun/21 Updated: 06/Jul/21 Resolved: 22/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Katherine Wu (Inactive) | Assignee: | Alya Berciu |
| Resolution: | Done | Votes: | 0 |
| Labels: | read-only-views | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Query Optimization 2021-06-28 | ||||||||
| Participants: | |||||||||
| Description |
|
The current design for $lookup support for sharded views is to perform view resolution at query runtime, which is not ideal from a design standpoint and could result in suboptimal performance in the case of nested $lookups. Investigate the engineering effort of resolving the view up front instead. |
| Comments |
| Comment by Katherine Wu (Inactive) [ 06/Jul/21 ] |
|
A slight addition to alya.berciu's comment above: if there was a nested $lookup with the 'from' field as a sharded view, doing the view resolution after targeting will result in slightly worse performance, since the top-level $lookup will rebuild the subpipeline for every input document, and so the nested $lookup will have to handle the CommandOnShardedViewNotSupportedOnMongod once every time the top-level $lookup subpipeline is run. If we wanted to circumvent this we'd have to either traverse through the pipeline and subpipelines of every stage to look for a $lookup and resolve the 'from' namespace (which has its difficulties as Alya mentioned above), or edit the stages inside the top-level DocumentSourceLookup's `_resolvedPipeline` the first time a nested $lookup encounters a CommandOnShardedViewNotSupportedOnMongod error, which seems hacky and prone to other problems. Given that $unionWith is already handling a similar situation with the CommandOnShardedViewNotSupportedOnMongod approach, it seems like we should also follow this approach for the sharded $lookup case. |
| Comment by Alya Berciu [ 22/Jun/21 ] |
|
I took a look at whether it would be possible to do the view resolution before splitting the pipelines as katherine.wu suggested. I don't think we should try to do this as part of the sharded lookup project. First, it is worth noting that $unionWith already handles this as described in the design document, where we just handle a The problem that I ran into was that we don't currently have a good way to resolve views using the primary shard's view catalog directly in mongos. Since we are not on a mongod, we can't access the primary's view catalog to resolve any views. We also can't generate the CommandOnShardedViewNotSupportedOnMongod exception before actually targeting the shards (the primary will always know the view definition). Doing this after targeting would result in the same overhead as handling the exception in the mongod of each shard that executes its portion of the pipeline. We would need to implement a command (or something similar) to request the view definition from the primary shard's view catalog so we can update the pipeline with the view definitions before splitting it. However, as Katherine pointed out, this would be at best a temporary measure until global view catalogs get implemented. |
| Comment by Katherine Wu (Inactive) [ 07/Jun/21 ] |
|
kyle.suarez the initial design for this was to handle the view resolution during the query's execution in the DocumentSourceLookUp's doGetNext() function; an "up front" alternative might be to resolve the view on mongos by querying the view catalog stored on the db's primary shard and updating the DocumentSourceLookUp accordingly, before sending it to be executed on the shards. |
| Comment by Kyle Suarez [ 07/Jun/21 ] |
|
What do we mean by "up front" here if it's not at query runtime? Would it be at the time of view creation (or any catalog change)? |