[SERVER-63845] Separate interface to get set of referenced variables from DocumentSource::getDependencies() Created: 18/Feb/22  Updated: 29/Oct/23  Resolved: 19/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Nicholas Zolnierz
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-63141 Difference in $lookup/$redact/$let be... Closed
is related to SERVER-66548 $lookup sequential cache can incorrec... Closed
is related to SERVER-68691 $graphLookup does not report variable... Closed
is related to SERVER-60800 Allow $search in $lookup/$unionWith Closed
Backwards Compatibility: Fully Compatible
Sprint: QO 2022-05-30, QO 2022-06-13, QO 2022-06-27, QO 2022-07-11, QO 2022-07-25, QO 2022-08-08, QO 2022-08-22
Participants:
Linked BF Score: 6

 Description   

The DocumentSource::getDependencies() interface currently returns two pieces of information:

  1. The set of depended-on field paths. This is used to internally generate a projection which can result in covered plans, or can prevent downstream pipeline stages from having to process documents containing unnecessary fields.
  2. The set of referenced variables. At the moment, this is used only to identify non-correlated $lookup sub-pipeline prefixes, so that the results from such a prefix can be cached and re-used.

Dependency analysis and getting the set of referenced variables are logically different operations, so it makes sense to separate them. Also, it is legal for a DocumentSource to return DepsTracker::NOT_SUPPORTED in order to indicate that it does not participate in dependency analysis – this makes sense for "source" stages such as $cursor or $mergeCursors. However, NOT_SUPPORTED can be problematic for callers which want to analyze which variables are referenced, since they must defensively assume that any variable could be referenced. See the fix from SERVER-63141 for an example.

In order to make this more natural for callers which want to analyze variable references, we should change getDependencies() so that it no longer returns the set of referenced variables inside the DepsTracker. In its place we should introduce a new virtual method DocumentSource::getReferencedVars(). This would be a pure virtual function that all DocumentSource derived classes must implement. It would return a set of Variables::Id identifiers.

As part of this work, we should change the $$SEARCH_META static analysis from SERVER-60800 to consume the new DocumentSource::getReferencedVars() interface. As of this writing, this code has not landed in the enterprise modules, but the current plan is for it to have an allowlist of stages that don't support dependency analysis but are known to not have any variable references. The addition of DocumentSource::getReferencedVars() would allow us to replace this workaround with a more easily maintainable solution.



 Comments   
Comment by Githook User [ 18/Aug/22 ]

Author:

{'name': 'Nicholas Zolnierz', 'email': 'nicholas.zolnierz@mongodb.com', 'username': 'nzolnierzmdb'}

Message: SERVER-63845 Separate variable reference tracking from pipeline field dependency analysis
Branch: master
https://github.com/mongodb/mongo/commit/39a79c12b930b7adc5fe2872e482f9e483121dcf

Comment by David Storch [ 25/May/22 ]

The lack of a separate interface for the variables referenced by a `DocumentSource` was essentially the root cause of SERVER-66548. It seems like we've had a couple of bugs crop up lately around incorrect detection of the non-correlated pipeline prefix. I think it's time that we schedule this work. Putting it back into the triage queue.

Generated at Thu Feb 08 05:58:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.