[SERVER-85576] $push accumulator memory usage Created: 23/Jan/24  Updated: 23/Jan/24

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mayuresh Kulkarni Assignee: Backlog - Atlas Streams
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Atlas Streams
Operating System: ALL
Participants:

 Description   

When restoring from a large state checkpoint, the restore fails and it seems to be due to this:

 

(ExceededMemoryLimit) $push used too much memory and cannot spill to disk. Memory limit: 104857600 bytes

The full logs for this can be obtained from this query in splunk:

link title

 

This SP first created a largish checkpoint (~5.8GB) and was then stopped. When it was later started, the start fails due to the ^ error.  The query that the SP is running is:

 

constpipeline= [
{
$source: {
connectionName:"Cluster0",
db:"mk-testdb",
coll:"inputColl",
timeField:
{ $toDate:"$fullDocument.ts", }
}
},
{$replaceRoot: {newRoot: "$fullDocument"}},
{
$project:
{ value: \{$range: [1, "$idx"]}
,
ts:"$ts",
}
},
{$unwind: "$value"},
{
$addFields:
{ "customerId": \{$mod: ["$value", 50]}
,
"max":"$value",
"idarray0": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray1": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray2": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray3": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
}
},
{
$tumblingWindow:
{ interval: \{size:NumberInt(3), unit:"hour"}
,
allowedLateness: {size:NumberInt(0), unit:"second"},
pipeline: [{
$group:
{_id: "$customerId", customerDocs: {$push: "$$ROOT"}, max: {$max: "$max"}}
}]
}
},
{$project: {customerId: "$_id", max: "$max"}},
{
$merge:
{ into: \{connectionName:"Cluster0", db:"mk-testdb", coll:"outputColl"}
,
}
}
];

 

The interesting thing about this failure is that it did not fail when taking the checkpoint but only when trying to restore from it.


Generated at Thu Feb 08 06:58:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.