-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Layered Tables
-
None
-
Storage Engines - Foundations
-
None
-
None
When we open a layered cursor on a follower we check for the last checkpoint and if there hasn't been any checkpoint yet we open a live btree for an empty stable table which is fine. The logic looks like:
WT_ERR_NOTFOUND_OK(
__wt_meta_checkpoint_last_name(session, stable_uri, &checkpoint_name, NULL, NULL), true);
if (ret == WT_NOTFOUND) {
// open live btree
}
However there might be a race when checkpoint arrives right after __wt_meta_checkpoint_last_name so we end up opening a shared live btree for a checkpoint on a follower which could lead to a data corruption, if we have a step up event right after we do this so we then write this old checkpoint to the disk.
This ticket scope is to introduce a production assertion to double check that checkpoint hasn't arrived after we open a live btree. This assertion will turn this problem from being a data corruption to a an availability problem.