-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cursors, Layered Tables
-
None
-
Storage Engines - Foundations
-
45.732
-
None
-
None
The problem:
When a layered cursor on the follower performs a write that depends on a pre-existing value (remove / update / insert-existence check), it only consults the session-visible state of the stable constituent. A committed stop_ts on the stable cell can be invisible at the session's read_ts but visible to the drain. This lets the session issue a write on a key that, from the drain's timestamp-independent view, has nothing to operate on — producing unresolvable state at drain time (e.g. the __layered_assert_tombstone_has_value_on_stable_btree assert seen in WT-17240).
Additional context:
Writes on the follower connections are different in MongoDB from the regular writes:
- Only successful operations are committed to oplog, therefore writes on followers should not produce write-conflicts.
- Followers apply oplog writes without read timestamps. Therefore, all existing changes are visible to the writer.
Requirements:
In order to prevent potential data corruption in replication scenarios, and/or simplify the test/format tasks, there is a requirement to detect write-conflicts on the followers. Ideally, write conflict detection should be identical to the leader. Perfect detection of write conflicts may be impossible due to requirement of blind writes, though.
Solutions:
One of the solutions would be to check every follower write against the stable. However, it contradicts the requirement of having blind writes (cursor config:overwrite=true). If every write is checked against the stable, then performance penalty is too high.
Another solution is to check against the stable only if overwrite=false and read timestamp is present. (This is current solution.) Performance penalty exists, but not for the scenarios that MongoDB is using.
- is related to
-
WT-17240 test/format (disagg.mode=switch) __layered_assert_tombstone_has_value_on_stable_btree assertion error
-
- Closed
-
- related to
-
WT-17278 Follower remove returns WT_NOTFOUND where leader returns WT_ROLLBACK, causing data mismatch in multi-node validation
-
- Open
-
-
WT-17311 Modify that sees an outdated tombstone returnd WT_NOTFOUND instead of WT_ROLLBACK
-
- Backlog
-
-
WT-17247 Layered cursor writes on follower should check for potential write conflicts
-
- In Code Review
-