Save/Restore of checkpoint should account for duplicated documents

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Atlas Streams
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Consider a pipeline with a `hoppingWindow`, like this one:

       

      pipeline: [{
      $group:
      { _id:"$customerId", customerDocs: {$push:"$$ROOT"},
      }
      }]
      

      Say there are 200 open windows. Now, an document will be absorbed into all these open windows. And, though the logical state size is O(200docs), the actual memory usage will still just be 1 doc since documents are cheaply copyable via ref-counting etc

      Now, when such a state is checkpointed and recovered, we lose this sharing info and so today we will end up with 200 different docs after the recovery.

      This causes ballooning in memory usage after the checkpoint has been recovered.

            Assignee:
            Unassigned
            Reporter:
            Mayuresh Kulkarni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: