[SERVER-78817] Parameterize pushed down $match stages for SBE plan cache and tests for same Created: 10/Jul/23  Updated: 29/Oct/23  Resolved: 31/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kevin Cherkauer Assignee: Kevin Cherkauer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Participants:

 Description   

$match stages that get pushed down to SBE should be parameterized for the SBE plan cache to avoid cache flooding.

See the discussion on line 437 of canonical_query_encoder.cpp in the PoC review here:

https://github.com/10gen/mongo/pull/13744/files/0f09d5d7a4c875aef4a8cc172238028a534f8e81#diff-505b6901656a3b7a189eb468f202966c69a3dafd3004fef097dbd4fd3089f449



 Comments   
Comment by Githook User [ 31/Aug/23 ]

Author:

{'name': 'Kevin Cherkauer', 'email': 'kevin.cherkauer@mongodb.com', 'username': 'kevin-cherkauer'}

Message: SERVER-78817 Parameterize pushed down $match stages for SBE plan cache
Branch: master
https://github.com/mongodb/mongo/commit/6833cb425d4e662247cece7405a0f56604589ed0

Comment by Kevin Cherkauer [ 25/Aug/23 ]

The main parts of this change are:

  1. Parameterize the entire forest of MatchExpressions that can now be pushed down to SBE, instead of just the primary one. Before this feature's additional $match pushdowns, there was only a single MatchExpression per query in SBE, so the forest only contained one tree. Dealing with a true forest required refactoring a few things.
  2. Encode the new trees of the forest into the plan cache key.
  3. Bind the parameters for the new trees at bind-in time.

Additional related changes:

  1. Enforce the maxMatchExpressionParams limit (default 512) globally across the forest instead of only per tree.
  2. Stop parameterizing when the limit is reached. (The code existing before this project would actually parameterize the entire primary MatchExpression even if this created far more than 512 parameters, then roll it all back in a second step if it turned out to be more than the limit.)
  3. Eliminate revertMode (the rollback of previously created MatchExpression parameters if the limit was exceeded). There is no need to do this as, to avoid cache flooding, we won't cache plans that exceeded the limit, and a partially parameterized plan will still work correctly. (Rolling back would also be harder to do in a forest than just for a single tree, as it would need to revisit previously completed trees.) The prior need for this rollback was because the parameterization pass might have created orders of magnitude more parameters, which cause the bind-in phase to be slow, but with the current PR we stop creating parameters after 512 so this problem goes away.
  4. Consolidate the CQ-related checks for whether to parameterize into a new CanonicalQuery::shouldParameterizeSbe() method.

Code clarity and developer productivity naming improvements:

  1. Rename CanonicalQuery::_root and root() to _primaryMatchExpression and getPrimaryMatchExpression(). "root" was a confusing name for this because it is not the root of the query plan or execution tree; it is in fact part of the bottom leaf of the plan and execution trees which is the opposite of the root; it is not actually any kind of query tree, though it being part of CanonicalQuery made it sound like that's what it was; and its prior lack of uniqueness reduced developer productivity (grepping for 'root(' gets more than 1,300 hits in the codebase).
  2. Rename CanonicalQuery::init() to initCq() to improve uniqueness (grepping for 'init(' gets more than 3,000 hits).
  3. Rename CanonicalQuery::_pipeline and setPipeline() to _cqPipeline and setCqPipeline() to improve uniqueness (grepping for 'pipeline' gets more than 9,000 hits, and most of them are likely about aggregation pipelines, not the cq pushdown pipeline).
Generated at Thu Feb 08 06:39:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.