Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7873

Investigate improving compact efficiency by having block manager identify blocks to migrate

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 8
    • Storage - Ra 2021-11-15

      This idea came up in a discussion with donald.anderson, keith.bostic, and sue.loverso.

      Can we improve compact efficiency by having the block manager identify the blocks that need to be moved?  This way the upper layer wouldn't need to walk the BTree looking for blocks to move. 

      The block manager has the block allocation information. So it knows which blocks are live and which are free.  The idea is that the higher layer (i.e., above the block manager) would ask the block manager what blocks would be helpful to move. In response, the block manager could suggest blocks to move (i.e., those with the highest offsets in the allocated extent list) and pass back the address cookies to the higher layer. The higher layer would figure out what the block is (possibly by cracking the address cookie, or by reading the block and examining its contents).  Then it would write the block to a different address. (There is synergy here with WT-6001. Ideally we can just tell the block manager to move a block without unpacking and reconciling it.)

      One challenge is overflow values.  There isn't information in those blocks to identify the leaf page that points to the overflow.  Since this is a rare case (esp. in MongoDB), we could just ignore those blocks.  Likewise if we find other corner cases where it is hard (or impossible) to figure out what a block is, it won't violate correctness to leave it where it is. 

      A scheme like this could also be useful when we implement garbage collection for tiered storage.  In that case we have the same problem as compact.  We have an object with a small number of blocks of live data. We want to identify those blocks and rewrite them so they will be allocated elsewhere.

       

            Assignee:
            haseeb.bokhari@mongodb.com Haseeb Bokhari (Inactive)
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: