[SERVER-48390] group with $accumulator complains memory exceeds 100MBs on smaller than 96MBs collection Created: 23/May/20  Updated: 29/Oct/23  Resolved: 10/Aug/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.4.0-rc2
Fix Version/s: 4.7.0, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Arun Banala
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Query 2020-08-24
Participants:

 Description   

I'm not sure how $accumulator contributes to tracking group size but it appears to be way overcounting when passed documents (unless it's allocating a huge amount to JS in general?)

db.scores.aggregate([{$match:{game:/^G[123]/}},{$count:"c"}])
{ "c" : 368061 }
db.scores.aggregate([{$match:{game:/^G[123]/}},{$group:{_id:0, size:{$sum:{$bsonSize:"$$ROOT"}}}}])
{ "_id" : 0, "size" : 30215379 }  /* 28GBs */
db.scores.aggregate([{$match:{game:/^G[123]/}}, {$group:{_id:"$game", top2: { $accumulator: { init: function() {  return null;   },  accumulateArgs: [ [1,2,3,4,5,"$player","$score"] ],                     accumulate: function(state, val) {    return state;     },                     merge: function(state1, state2) {                         return state1;  },      finalize: function(state) {   return state;   }   } }  } },{$count:"c"}])
{ "c" : 33 }
/* as soon as I pass a document as args */
db.scores.aggregate([{$match:{game:/^G[123]/}}, {$group:{_id:"$game", top2: { $accumulator: { init: function() {  return null;   },  accumulateArgs: [ {score:"$score"} ],                     accumulate: function(state, val) {    return state;     },                     merge: function(state1, state2) {                         return state1;  },      finalize: function(state) {   return state;   }   } }  } },{$count:"c"}])
Error: command failed: {
	"ok" : 0,
	"errmsg" : "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.",
	"code" : 292,
	"codeName" : "QueryExceededMemoryLimitNoDiskUseAllowed"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:618:17
assert.commandWorked@src/mongo/shell/assert.js:708:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12
@(shell):1:1
/* with limit */
db.scores.aggregate([{$match:{game:/^G[123]/}},{$limit:335000}, {$group:{_id:"$game", top2: { $accumulator: { init: function() {  return null;   },  accumulateArgs: [ {score:"$score"} ],                     accumulate: function(state, val) {    return state;     },                     merge: function(state1, state2) {                         return state1;  },      finalize: function(state) {   return state;   }   } }  } },{$count:"c"}])
{ "c" : 30 }
db.scores.aggregate([{$match:{game:/^G[123]/}},{$limit:340000}, {$group:{_id:"$game", top2: { $accumulator: { init: function() {  return null;   },  accumulateArgs: [ {score:"$score"} ],                     accumulate: function(state, val) {    return state;     },                     merge: function(state1, state2) {                         return state1;  },      finalize: function(state) {   return state;   }   } }  } },{$count:"c"}])
Error: command failed: {
	"ok" : 0,
	"errmsg" : "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.", ...

Collection stats:

db.scores.aggregate({$collStats:{storageStats:{scale:1024*1024}}},{$project:{"storageStats.wiredTiger":0,"storageStats.indexDetails":0}}).pretty()
{
	"ns" : "agg.scores",
	"host" : "asyas-mbp-4.lan:27017",
	"localTime" : ISODate("2020-05-23T18:44:17.107Z"),
	"storageStats" : {
		"size" : 95,
		"count" : 1130151,
		"avgObjSize" : 88,
		"storageSize" : 31,
		"freeStorageSize" : 0,
		"capped" : false,
		"nindexes" : 2,
		"indexBuilds" : [ ],
		"totalIndexSize" : 51,
		"totalSize" : 82,
		"indexSizes" : {
			"_id_" : 19,
			"game_1_score_-1" : 32
		},
		"scaleFactor" : 1048576
	}
}



 Comments   
Comment by Githook User [ 11/Sep/20 ]

Author:

{'name': 'Arun Banala', 'email': 'arun.banala@mongodb.com', 'username': 'banarun'}

Message: SERVER-48390 Exhaust pending calls when $group with $accumulator runs out of memory

(cherry picked from commit bf48331b0343b191c0d94aef888cdec471a6508b)
Branch: v4.4
https://github.com/mongodb/mongo/commit/8f7a5173f865ceb975d6a45add855146c5f052da

Comment by Githook User [ 10/Aug/20 ]

Author:

{'name': 'Arun Banala', 'email': 'arun.banala@mongodb.com', 'username': 'banarun'}

Message: SERVER-48390 Exhaust pending calls when $group with $accumulator runs out of memory
Branch: master
https://github.com/mongodb/mongo/commit/bf48331b0343b191c0d94aef888cdec471a6508b

Comment by Asya Kamsky [ 26/May/20 ]

In general what's most confusing is we seem to be counting incoming documents which is strange since we don't process them all at the same time... 

Generated at Thu Feb 08 05:16:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.