Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84146

Save/Restore of checkpoint should account for duplicated documents

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • Atlas Streams

    Description

      Consider a pipeline with a `hoppingWindow`, like this one:

       

      pipeline: [{
      $group:
      { _id:"$customerId", customerDocs: {$push:"$$ROOT"},
      }
      }]
      

      Say there are 200 open windows. Now, an document will be absorbed into all these open windows. And, though the logical state size is O(200docs), the actual memory usage will still just be 1 doc since documents are cheaply copyable via ref-counting etc

      Now, when such a state is checkpointed and recovered, we lose this sharing info and so today we will end up with 200 different docs after the recovery.

      This causes ballooning in memory usage after the checkpoint has been recovered.

      Attachments

        Activity

          People

            backlog-atlas-streams@mongodb.com Backlog - Atlas Streams
            mayuresh.kulkarni@mongodb.com Mayuresh Kulkarni
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: