[SERVER-32014] scanAndOrder serverStatus metric should not be incremented for $sort stages that are merging pre-sorted results Created: 17/Nov/17  Updated: 06/Dec/22  Resolved: 20/Apr/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.0, 3.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: diagnosibility
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
duplicates SERVER-24978 Second batches in aggregation framewo... Closed
Related
Assigned Teams:
Query
Operating System: ALL
Backport Requested:
v3.6, v3.4
Participants:
Case:

 Description   

An aggregation pipeline computes whether or not it has a sort stage by scanning through the pipeline looking for a DocumentSourceSort. This will include any $sort stages that are simply merging the pre-sorted results of multiple shards. Those stages should not count towards the 'hasSortStage' metric, since they will not actually perform a blocking sorting algorithm, which is what that metric is intended to represent. This also impacts the serverStatus metric "metrics.operation.scanAndOrder".

Original description, shamelessly copied from report by akira.kurogane:

The following agg command would cause a double-increment of serverStatus.metrics.operation.scanAndOrder on shards that it was targeted to, and one more increment again if it was the shard that the merge step happened on.

db.coll.aggregate([
  {$match: {A, B, C}}, //i.e. potentially multi-shard
  {$sort: {C, D}} //i.e. something not index-compatible
])

It looks to [Akira] this happens because recordCurOpMetrics() will increment that scanAndOrder once for every aggregation stage that has hasSortStage == true, and the PipelineCommand sets sets that property too when it finds any child aggregation stage is a DocumentSourceSort.
Lastly there is a third increment on the shard the merge happens. [Akira] guesses that is the ClusterAggregate command also setting the hasSortStage property true.

We should ensure that any aggregation operation will only increment the scanAndOrder metric at most once on each shard.



 Comments   
Comment by Charlie Swanson [ 20/Apr/18 ]

This issue was resolved during work on SERVER-24978. During that work, we no longer use a $sort stage on the merging half of the pipeline.

Comment by Akira Kurogane [ 03/Dec/17 ]

Thanks Charlie, David.

Making this ticket only about avoiding a scanAndOrder increment when merging pre-sorted results sounds good to me. The intersection of users who pay close attention to this metric is close to 100% overlap with the index-savvy users I'd reckon, so that increment is the only one that will bother them.

Comment by Charlie Swanson [ 01/Dec/17 ]

After looking a little closer, it doesn't look like we're incrementing scanAndOrder once for each $sort stage. However, it does look like we are inappropriately incrementing this metric if the $sort stage is simply merging pre-sorted streams of documents. I've updated the title and description of this ticket accordingly.

akira.kurogane it is somewhat unclear whether multiple blocking $sort stages should count separately towards scanAndOrder - right now I think there's a max of two increments of scanAndOrder per-aggregation, which is probably not ideal. I've filed SERVER-32150 to track that discussion, we'll fix this $mergingPresorted issue separately.

cc david.storch

Generated at Thu Feb 08 04:28:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.