-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
8
-
StorEng - Refinement Pipeline
Summary
What is the problem or use case, what are we trying to achieve?
MongoDB, in the context of replicated data, at times need to commit its reads. Knowing the timestamp of values that were read can simplify how MDB commits reads.
Motivation
- Does this affect any team outside of WT?
MDB, not blocked.
- How likely is it that this use case or problem will occur?
An expected code path for many workloads.
- If the problem does occur, what are the consequences and how severe are they?
Today MDB acquires a new WT snapshot and reads from the top of oplog to get an upper bound on what data may have been read. This is costly in multiple ways:- We get an upper bound. In the expected case, we pessimistically wait when the data read had been committed for a long time.
- We acquire a new snapshot + cursor + reverse cursor walk
- We use a new WT_SESSION on the same thread while the original WT_SESSION may hold resources. It's a WT limitation that two sessions being used concurrently can lead to deadlock.
- Is this issue urgent?
(Does this ticket have a required timeline? What is it?)
No
Acceptance Criteria (Definition of Done)
- An API that can expose either:
- Individual durable timestamps for positioned cursors
- The max of all durable timestamps for updates done within a transaction returned as part of a cursor->get_value call. Presumably on the WT_SESSION object.
- If the timestamp information has been wiped because it is smaller than the oldest timestamp:
- It is acceptable to return 0
- Not acceptable to return WT_NOTFOUND/error
- Testing
(What all testing needs to be done as part of this ticket? Unit? Functional? Performance?Testing at MongoDB side?)
Yes. I don't expect MDB will need specific testing for this.
- Documentation update
(Does this ticket require a change in the architecture guide? If yes, please create a corresponding doc ticket.)
Unlikely
[Optional] Suggested Solution
(Is there any suggested solution to handle this issue? Is it related to any existing WT ticket? Is it related to any previous issue fixed? If yes, link the WT ticket number using related to, depends on, dependent on by links)
WT_CURSOR::get_timestamp_and_value(WT_CURSOR*, int64_t* timestamp, WT_ITEM/value macros)