[DOCS-14454] Investigate changes in SERVER-48380: Expose total data size in bytes processed by $sort and $group in agg execution stats explain Created: 13/May/21  Updated: 13/Nov/23  Due: 27/Aug/21  Resolved: 23/Aug/21

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 4.9.0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Joseph Dougherty
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-48380 Expose total data size in bytes proce... Closed
Participants:
Days since reply: 2 years, 24 weeks, 5 days ago
Epic Link: DOCSP-9747
Story Points: 3

 Description   

Description

Downstream Change Summary

When running an explain of an aggregation pipeline containing $sort or $group when the verbosity is executionStats or above, the output will contain some extra fields exposing the amount of data processed.

The new fields are, for each stage:

$sort:

  • totalDataSizeSortedBytesEstimate (in bytes)
  • usedDisk (boolean)

$group:

  • totalDataSizeGroupedBytesEstimate (in bytes)
  • usedDisk (boolean)

Description of Linked Ticket

SERVER-21784 recently added execution stats to the agg execution layer, and exposed them via "executionStats" or "allPlansExecution" explain verbosities. This ticket, however, added only nReturned and executionTimeMillis for every stage. There are more stats that we can expose which will be useful for debugging and performance investigations.

One suggestion from alex.bevilacqua is to expose the amount of data processed by $sort or $group. We have such stats for sorts executed in the PlanStage layer, but not for sorts executed in the DocumentSource layer. The $sort stage would report a totalDataSizeSorted metric, and the $group stage would report totalDataSizeGrouped.

Another idea that we could consider implementing at the same time is to report usedDisk:true when either a $sort or a $group spills to disk at runtime.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 20/Aug/21 ]

Author:

{'name': 'Joseph Dougherty', 'email': 'joseph.dougherty@mongodb.com', 'username': 'jmd-mongo'}

Message: DOCS-14454 expose data size in bytes processed by sort and group in agg execution stats explain
Branch: master
https://github.com/mongodb/docs/commit/e6498066223e653afacc045cdf88a15fbb84b3e9

Comment by Joseph Dougherty [ 26/Jul/21 ]

Hello rishab.joshi! Have you had a chance to look at this one?

Thank you,
Joe Dougherty

Comment by Joseph Dougherty [ 13/Jul/21 ]

Hello rishab.joshi!

I'm in the process of documenting these new attributes, but my test results don't seem correct. Would you mind taking a look?

I'm using MongoDB 5.0.0 with Mongosh 1.0.0. I've run the sample query (found in the issue summary here), but found the results to be different than expected.

Sample query:

> db.local.explain("executionStats").aggregate([{$lookup:{from:'foreign', localField: 'localField', foreignField: 'foreignField', as: 'output'}}, {$sort: {localField: 1}}, {$group: {_id: null}}])

Here are my results for the $group stage:

 
{
      '$group': { _id: { '$const': null } },
      maxAccumulatorMemoryUsageBytes: {},
      totalOutputDataSizeBytes: Long("0"),
      usedDisk: false,
      nReturned: Long("0"),
      executionTimeMillisEstimate: Long("0")
 }

I'm not sure why I'm seeing totalOutputDataSizeBytes when I'd expected to see totalDataSizeBytesEstimate.

Is there something else I need to do in order to trigger the desired behavior, or am I possibly looking in the incorrect place in the explain output?

Thanks for your help!
Joe

Generated at Thu Feb 08 08:10:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.