[SERVER-72894] Use traverseP expression to build the sort key for the common key prefix case Created: 17/Jan/23  Updated: 17/Jan/23  Resolved: 17/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Rui Liu Assignee: Rui Liu
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: QE 2023-01-23
Participants:

 Description   

Use traverseP expression to build the sort key for the common key prefix case. We should make sure the expression for the common prefix is only calculated once. We might need to add a workload for a longer key prefix to make sure it doesn't regress.

Preliminary perf: https://performance-analyzer.server-tig.prod.corp.mongodb.com/perf-analyzer-viz/?evergreen_version=63c582a232f4171734d0e297&evergreen_base_version=performance_550e070c93e80bdb4f256dd634128ad919395eb0&metrics_selection=Latency99thPercentile||ops_per_sec||Latency50thPercentile||OperationThroughput||tpmC||NewOrders||Duration||average_read_latency_us||95th_read_latency_us||99th_read_latency_us&percent_filter=0||100&z_filter=0||10



 Comments   
Comment by Rui Liu [ 17/Jan/23 ]

There seems to be some further complication regarding MQL semantics. The following text is from here:

Q: One question I have is, can you give some more context about why it's necessary to fall back to the old key generation logic in the case where two of the components share a prefix

A:

"What's so tricky about the 'paths with a common prefix' case?", you might ask.

Normally, we can compute the sort key for each part of the sort pattern
independently. However, when two or more sort pattern parts have paths with a
common prefix, their sort keys need to be computed together.

Let's consider an example. Suppose our sort pattern is

{"a.b":1, "a.c":1}

and
we're computing the sort keys for the following document:

{a: [

{b:11,c:29}

,

{b:11,c:28}

,

{b:12,c:27}

]}

If we computed the sort keys for "a.b" and "a.c" independently, then the sort
key for "a.b" would be 11 and the sort key for "a.c" would 27. This is not
what the classic execution engine does though.

The classic execution engine will traverse the array in field "a" and search for
the sub-document where "b" is the smallest, and if there are ties it will use
field "c" to break the ties. In this example, the sub-document where "b" is the
smallest (using "c" to break ties) is

{b:11,c:28}

. So the classic execution will
use 11 as the sort key for "a.b" and 28 as the sort key for "a.c".

I could implement this natively in SBE using TraverseStage. However, because I
would need to deal with multiple values in the TraverseStage but TraverseStage
only supports having one output slot and one fold expression, I would need to
put the multiple values inside an array (so that I could fold the values using a
single fold expression and return the values via a single output slot). There's
probably some non-trivial performance overhead to doing this, so I figured this
might be a good time to ask what people's thoughts are on the idea of updating
TraverseStage to support multiple output slots and multiple fold expressions.

Generated at Thu Feb 08 06:23:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.