Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84146

Save/Restore of checkpoint should account for duplicated documents

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Atlas Streams

      Consider a pipeline with a `hoppingWindow`, like this one:

       

      pipeline: [{
      $group:
      { _id:"$customerId", customerDocs: {$push:"$$ROOT"},
      }
      }]
      

      Say there are 200 open windows. Now, an document will be absorbed into all these open windows. And, though the logical state size is O(200docs), the actual memory usage will still just be 1 doc since documents are cheaply copyable via ref-counting etc

      Now, when such a state is checkpointed and recovered, we lose this sharing info and so today we will end up with 200 different docs after the recovery.

      This causes ballooning in memory usage after the checkpoint has been recovered.

            Assignee:
            Unassigned Unassigned
            Reporter:
            mayuresh.kulkarni@mongodb.com Mayuresh Kulkarni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: