-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Engines
-
StorEng - Defined Pipeline
keith.bostic@mongodb.com asked a series of interesting questions about how the code that manages whether a particular transaction has read and/or durable timestamps shared for global transaction states to include. The answers to these questions seemed nuanced enough that I wanted to capture them permanently in JIRA - hence the ticket.
The questions were:
I have a question about this code, if you have a second:
/* * __wt_txn_clear_read_timestamp -- * Clear a transaction's published read timestamp. */ void __wt_txn_clear_read_timestamp(WT_SESSION_IMPL *session) { WT_TXN *txn; WT_TXN_SHARED *txn_shared; txn = session->txn; txn_shared = WT_SESSION_TXN_SHARED(session); if (F_ISSET(txn, WT_TXN_SHARED_TS_READ)) { /* Assert the read timestamp is greater than or equal to the pinned timestamp. */ WT_ASSERT(session, txn_shared->read_timestamp >= S2C(session)->txn_global.pinned_timestamp); WT_WRITE_BARRIER(); F_CLR(txn, WT_TXN_SHARED_TS_READ); } txn_shared->read_timestamp = WT_TS_NONE; }
First question: Is the purpose of the WT_TXN_SHARED_TS_READ flag to indicate whether or not the read-timestamp is set?
Second question: If the answer to the first question is “yes”, then what is the point of the write barrier? Shouldn’t this be written as:
F_CLR(txn, WT_TXN_SHARED_TS_READ); WT_PUBLISH(txn_shared->read_timestamp, WT_TS_NONE);
That is, ensure the flag is cleared before the read-TS is set to 0?
Third question, __txn_assert_after_reads() doesn’t check WT_TXN_SHARED_TS_READ, although it does check shared-read-TS == 0. Is that correct?
Fourth question: now that a TS of 0 is out-of-bounds, it’s possible the flag is no longer needed? And, I should note the durable-TS has similar issues.
Fifth question: is there a simple statement of what the global rwlock is supposed to guarantee? For example, both __txn_assert_after_reads() and __wt_txn_set_read_timestamp() acquire it and my suspicion is they don’t need to. Given the heavy update pattern of MDB server of the oldest/stable timestamps, avoiding that lock when setting a read timestamp, which is also a common operation, is probably worth doing.
Definition of done: please make sure this is appropriately documented (outside of Jira)