[SERVER-75604] Eliminate CollectionScanNode.filter when not needed for clustered collection scans Created: 03/Apr/23  Updated: 14/Nov/23  Resolved: 13/Nov/23

Status: Closed
Project: Core Server
Component/s: Query Execution, Query Planning
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kevin Cherkauer Assignee: James Harrison
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Sprint: QO 2023-10-16, QO 2023-10-30, QO 2023-11-13, QO 2023-11-27
Participants:

 Description   

A QuerySolutionNode.filter is always generated for clustered collection scans if the bounds are from expressions, apparently solely to distinguish < from <= and > from >=. In these cases, the scan does a bounds-inclusive scan, then the filter eliminates any records for bounds that are actually exclusive.

For example, a query like this against a clustered collection always generates a filter:

db.ni.find({$and: [{_id: {$gt: 1}}, {_id: {$lt: 3}}]})

However, if the bounds were specified via the "min" (always inclusive) and "max" (always exclusive) options, the plan does not generate a filter, and the scan operator is expected to enforce the correct bounds itself. For example, a query like the following against a clustered collection does NOT generate a filter:

db.ni.find().min({_id: 1}).max({_id: 2}).hint({_id: 1})

Given that it is trivially easy to enforce the correct bounds inside the scan operator, and it is already responsible for doing so for the min-max case, the optimizer should stop generating collection scan filters that exist solely for scan bound inclusive vs exclusive enforcement.

This optimization may also be applicable to index scans that have been decomposed into one or more intervals.

The scan operator will need to know whether the lower and upper bounds are inclusive or exclusive. CollectionScanParams (collection_scan_common.h) has a type that is used in plan nodes to indicate this, although it is a bit hard to consume:

    enum class ScanBoundInclusion {
        kExcludeBothStartAndEndRecords,
        kIncludeStartRecordOnly,
        kIncludeEndRecordOnly,
        kIncludeBothStartAndEndRecords,
    };

It would be easier to consume if it were just two booleans like

// A scan bound is exclusive if the respective flag is false and inclusive it it is true.
bool scanLowerBoundInclusive;
bool scanUpperBoundInclusive; 

Whether booleans or the existing enum are used, it needs to be ensured these are parameterized with the SBE plan cache so that cached plans do not have permanently baked-in information on inclusive vs exclusive but instead can be correctly reused at runtime for queries that have different bounds. (I do not know if this is already the case with the CollectionScanParams::ScanBoundfInclusion CollectionScanNode.boundInclusion parameter.)

FYI david.storch@mongodb.comhana.pearlman@mongodb.comamr.elhelw@mongodb.com



 Comments   
Comment by Billy Donahue [ 14/Nov/23 ]

Based on somre recent experience I've had with operator<=>, it's possible that the use of <compare> header entities like std::eq will be a problem on our XCode13.2.1.

I think they have limited support for spaceship operator. I think we might want to try a more conservative approach for the comparison operators.

Comment by Githook User [ 13/Nov/23 ]

Author:

{'name': 'James Harrison', 'email': '00jamesh@gmail.com', 'username': 'jameseh96'}

Message: SERVER-75604 Eliminate CollectionScanNode.filter when not needed for clustered collection scans
Branch: master
https://github.com/mongodb/mongo/commit/c548badacf4fbd0254fd1fce6aff930abfcc9102

Generated at Thu Feb 08 06:30:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.