[SERVER-49027] Extend $group explain execution stats to report memory consumption per accumulator Created: 23/Jun/20  Updated: 29/Oct/23  Resolved: 08/Dec/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Diagnostics, Querying
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Ruoxin Xu
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-53303 Make sort and group execution stages ... Closed
is related to SERVER-44174 $push and $addToSet should restrict m... Closed
is related to SERVER-48380 Expose total data size in bytes proce... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2020-11-30, Query 2020-12-14
Participants:

 Description   

We've seen several cases in the field of $group operations which use $addToSet or $push to create very large arrays. This can lead to the memory footprint being large. In related ticket SERVER-44174, we added a check that will fail a query if a particular $addToSet or $push accumulator's memory consumption exceeds 100MB. However, understanding the memory consumption for a query can still be difficult since there are limited diagnostics in this area.

In order to help users understand which operators in the query plan are consuming memory, we should extend "executionStats" and "allPlansExecution" verbosity explain output for aggregate commands to include the maximum memory footprint attained at query runtime on a per-accumulator basis.



 Comments   
Comment by Githook User [ 07/Dec/20 ]

Author:

{'name': 'Ruoxin Xu', 'email': 'ruoxin.xu@mongodb.com', 'username': 'RuoxinXu'}

Message: SERVER-49027 Extend $group explain execution stats to report memory consumption per accumulator
Branch: master
https://github.com/mongodb/mongo/commit/58ddbe61d0de1cb53773486b1aa343194fc3da06

Comment by Katya Kamenieva [ 24/Nov/20 ]

I think this is fine

Comment by Ruoxin Xu [ 24/Nov/20 ]

The current patch would add a new field called "$maxAccumulatorMemoryUsageBytes" in $group stage's explain output, something like,

"stages": [
    {
          // some other stage's explain output
    },
    {
        "$group" : {
            "_id" : "$b",
            "count" : {
                "$sum" : {
                    "$const" : 1
                }
            },
            "push" : {
                "$push" : "$bigStr"
            },
            "set" : {
                "$addToSet" : "$bigStr"
            },
        },
        "maxAccumulatorMemoryUsageBytes" : {
                "count" : NumberLong(3600),
                "push" : NumberLong(245680),
                "set" : NumberLong(57200)
        },
        "nReturned" : NumberLong(50),
        "executionTimeMillisEstimate" : NumberLong(6)
    },
]

kateryna.kamenieva Any idea in terms of the representation of the accumulator memory consumption stats in explain output?

Generated at Thu Feb 08 05:18:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.