-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Checkpoints, Metadata
-
None
-
Storage Engines - Persistence
-
SE Persistence backlog
-
None
The following race condition is currently possible:
- A node creates a new empty table, which adds the table's info to the WT_DISAGG_COPY_METADATA list.
- The node starts a checkpoint.
- After the checkpoint code releases the schema lock, another thread drops the table. Since the table is still empty (and it is thus not dirty), the drop operation proceeds.
- At the end of the checkpoint, the checkpoint code copies updates from the local metadata table to the shared metadata table for all tables in the WT_DISAGG_COPY_METADATA list. But at this point, the table no longer exists.
Simply ignoring this violates the desired API semantics of checkpoints, which is to capture the state of the system at the time the checkpoint begins. Even though schema operations are not transactional, this can still lead to unexpected behavior from the application's perspective. For example, this could result in _mdb_catalog still containing the entry for the new table, but with the actual table missing from the checkpoint.
Note that in disaggregated storage (unlike attached storage), it is still safe for followers to access dropped tables, provided that the node picked up a checkpoint with the table still present.
There are at least two ways to fix this:
- Collect the metadata of new tables at the beginning of a checkpoint, just in case.
- Instead of WT_DISAGG_COPY_METADATA just storing information about which tables need to be processed and instead of checkpoint resolve modifying the shared metadata table directly, unify the two mechanisms: Whenever a schema operation or checkpoint resolve needs to modify the metadata, store a new version of the metadata in a list, and apply that list at the end of a checkpoint.
- related to
-
SERVER-116085 Test failure with side writes ident missing on secondary
-
- In Progress
-