-
Type: Investigation
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Explain
-
Not Needed
Explain output will show vectorized expressions and block processing stages.
Description of Linked Ticket
Summary
Block Processing as a concept, initiated here at MongoDB, has reached sufficient maturity to be implemented. Inspired by vectorised query processing, block processing refers to the practice of performing identical basic computations over a set of values of identical types, exploiting known dataset and computation properties, and adapting seamlessly to cases where this is not immediately applicable due to MQL semantics or properties of stored document dataset.
Directly connecting to the preliminary container stage work (PM-3167), this project implements the block processing core that can deliver significantly faster queries over Time Series collections. These queries typically include the stages $match, $project and $group, accumulators and comparison operators.
Motivation
Time Series queries process a larger dataset in comparison to common lookups($find()) queries. The query performance has been a challenge despite continuous attempts at improvement, for example, those recommended and implemented from a previous Spike. From both customer acquisition cases and published competitor benchmarks, it is clear we need to deliver order of magnitude level improvement to increase MongoDB’s share of Time Series workloads. The Block Processing Spike project has proven a MQL compatible blocking processing model can deliver this improvement, which will be implemented now.
The technology developed in this project can in principle apply to queries over other densely stored primary or secondary datasets with appropriate adapters. These include ADL, regular and columnstore indexes.