Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Atlas Streams

When restoring from a large state checkpoint, the restore fails and it seems to be due to this:

(ExceededMemoryLimit) $push used too much memory and cannot spill to disk. Memory limit: 104857600 bytes

The full logs for this can be obtained from this query in splunk:

link title

This SP first created a largish checkpoint (~5.8GB) and was then stopped. When it was later started, the start fails due to the ^ error. The query that the SP is running is:

constpipeline= [
{
$source: {
connectionName:"Cluster0",
db:"mk-testdb",
coll:"inputColl",
timeField:
{ $toDate:"$fullDocument.ts", }
}
},
{$replaceRoot: {newRoot: "$fullDocument"}},
{
$project:
{ value: \{$range: [1, "$idx"]}
,
ts:"$ts",
}
},
{$unwind: "$value"},
{
$addFields:
{ "customerId": \{$mod: ["$value", 50]}
,
"max":"$value",
"idarray0": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray1": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray2": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
"idarray3": ["$_id", "$_id", "$_id", "$_$id", "$_id", "$_id"],
}
},
{
$tumblingWindow:
{ interval: \{size:NumberInt(3), unit:"hour"}
,
allowedLateness: {size:NumberInt(0), unit:"second"},
pipeline: [{
$group:
{_id: "$customerId", customerDocs: {$push: "$$ROOT"}, max: {$max: "$max"}}
}]
}
},
{$project: {customerId: "$_id", max: "$max"}},
{
$merge:
{ into: \{connectionName:"Cluster0", db:"mk-testdb", coll:"outputColl"}
,
}
}
];

The interesting thing about this failure is that it did not fail when taking the checkpoint but only when trying to restore from it.

Assignee:: Unassigned

Reporter:: Mayuresh Kulkarni

Participants:: Mayuresh Kulkarni

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Jan 23 2024 12:28:42 AM UTC

Updated:: Oct 01 2024 09:43:37 PM UTC

Details

Description

Attachments

Activity

People

Dates