Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12987

Investigate ideas to improve compaction walk

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Compaction
    • None
    • Storage Engines
    • 5
    • StorEng - Defined Pipeline

      Currently, compaction walks every internal page (from left to right) and tries to find eligible candidates for compaction. Eligible candidates are the ones who have an offset in the last 10%/20% of the file being compacted.

      This can be a waste of time as we may need to walk many internal/leaf pages before finding the ones in the last 10%/20%. However, we could use our extent list and only read the blocks present in the last 10%/20%. This way, we would only walk the eligible candidates.

      There is another scenario we should improve regarding the tree walk. If a checkpoint is running on the table compaction is working on, compaction will stop and restart the walk. If the walk is slow and checkpoint happens again, we will restart again, etc. This means we may always read the same pages and be interrupted by checkpoint at the same time and compaction will potentially never reads pages. Instead, we could resume the walk from the last location before being interrupted.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            etienne.petrel@mongodb.com Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: