Details
-
Improvement
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
-
None
Description
I wrote this up in June 2009, so might be out of date. Just putting it in jira for safekeeping. Just looking at what I wrote back then, I think I preferred the deleted record globbing approach because it didn't necessitate a data format change. But there could be a minor (backwards compatible ?) format change that we could do in addition to the globbing, would be more robust and allow us to do some additional validation.
Right now records and deleted records are formatted so we can scan
through them using linked lists, but I believe the data format is such
that we can alternatively scan along an extent by looking at whatever
record or deleted record comes right after a given record or deleted
record on disk. (If I know the offset of a given record, I can just
add the length of that record to its offset to get the offset of the
next record / deleted record.) So in this way if you have a deleted
record you want to reclaim, you can just scan along the disk for the
next regular record. Since regular records are in a doubly linked
list, you can then splice in the deleted record as a new record.
One issue is that it may not be possible to differentiate a deleted
record from a regular record just by looking at the bit
representation. A couple ways of dealing with this. One, change the
data format. I think a nicer way would be to get aggressive about
aggregating deleted records when possible. By this I mean when we
process a write operation we never allow two deleted records to be
next to each other as a result of that operation; the deleted records
must be merged into one. If we adopt this policy, then the region on
disk directly following a deleted record must be a real record, so we
can easily splice a deleted record into the regular record list. (We
already track the last record per extent in the extent header, so if a
deleted record at the end of an extent is to be reclaimed we can still
splice it in easily.)
If I remember correctly we do some deleted record globbing like this
already, but not after every write operation. If records are ordered,
we can do globbing efficiently on every write: if a record is to be
converted to a deleted record, we find the regular records adjacent to
the regular record to be deleted and then convert all disk space
between those to a single deleted record (the 1 regular record and <=2
deleted records in this region can be easily unhooked from their
doubly linked lists).