-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
WiredTiger does not support transactional DDL operations - so when an ident in WiredTiger is dropped, the data is deleted immediately, and it is not possible to read at a timestamp prior to the drop timestamp.
In order to allow MongoDB users to read at a timestamp prior to the drop timestamp, we implement "two-phase dropping", whereby a collection is removed from the catalog, and the actual data for the collection (including the ident for the collection and idents for all its indexes) is deleted later. This works as follows:
When a user drops a collection, the ident for the collection (and the idents for its indexes) are added to a "drop pending ident reaper", which holds a list of idents to be dropped. When the minimum of the oldest timestamp and the checkpoint timestamp in WiredTiger advances, we call DropPendingIdentReaper::dropIdentsOlderThan to drop idents older than that timestamp.
However, to keep an ident from being dropped while reads are in progress on that ident, the drop pending ident reaper also keeps a weak_ptr to an Ident object, and avoids actually dropping the ident while outstanding references to this ident remain. These references exist for collection idents in the RecordStore owned by instances of the CollectionImpl class. For indexes, these references exist in the SortedDataInterface, which is owned by the IndexAccessMethod, which is owned by the IndexCatalogEntry, which is owned by IndexCatalogImpl, which is owned by the CollectionImpl.
Managing these references is a bit tricky, and WiredTiger already has a timestamp called the 'pinned timestamp', which is the "minimum of the oldest_timestamp and the oldest active reader". If, instead of dropping idents when the oldest_timestamp advanced, we waited until the pinned_timestamp advanced, we could remove the need for reference counting ident usage inside MongoDB and simplify this whole garbage collection process.
One possible downside is that the pinned timestamp is controlled by readers across all collections, so if there is a reader for a timestamp older than the oldest_timestamp for one collection, this will prevent dropping the ident for another collection which was dropped before the oldest_timestamp and has no remaining readers.