-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
Companion TLA+ specification for SERVER-120304 ("Snapshot time may advance before fetching a command's operationTime"). Models the (AcquireSnapshot, ExecuteRead, FetchOperationTime, SendResponse) interleaving with a concurrent AdvanceCommittedSnapshot writer, parameterised by a UseLiveCommittedSnapshot boolean that toggles between the pre-fix code (live committed-snapshot read) and the fix (getLastUsedReadTimestamp()).
Files
- src/mongo/tla_plus/Replication/SnapshotOperationTimeOrdering/SnapshotOperationTimeOrdering.tla (253 lines)
- src/mongo/tla_plus/Replication/SnapshotOperationTimeOrdering/MCSnapshotOperationTimeOrdering.tla (15 lines)
- src/mongo/tla_plus/Replication/SnapshotOperationTimeOrdering/MCSnapshotOperationTimeOrdering.cfg (32 lines)
- src/mongo/tla_plus/Replication/SnapshotOperationTimeOrdering/OWNERS.yml (routes to 10gen/server-replication-reviewers)
- src/mongo/tla_plus/Replication/SnapshotOperationTimeOrdering/README.md
Invariant
SnapshotMatchesOperationTime == \A r \in executedReads : r.opTime <= r.snapshot
TLC verdicts
| Configuration | Result | States |
|---|---|---|
| UseLiveCommittedSnapshot = FALSE (fix) | No error | 1398 generated / 530 distinct |
| UseLiveCommittedSnapshot = TRUE (pre-fix) | SnapshotMatchesOperationTime violated at depth 8 | 455 generated / 245 distinct |
The bug trace: ClockTick → AcquireSnapshot(t1)@committedSnapshot=0 → AdvanceCommittedSnapshot→1 → ClockTick → ExecuteRead → FetchOperationTime (reads live 1) → SendResponse → {{executedReads =
{[snapshot |-> 0, opTime |-> 1]}}}. Mirrors the jstest in jstests/replsets/operation_time_majority_read_race.js.
How to run
cd src/mongo/tla_plus ./download-tlc.sh ./model-check.sh Replication/SnapshotOperationTimeOrdering
Flip UseLiveCommittedSnapshot to TRUE in the cfg to reproduce the counterexample.
Related
SERVER-120304— snapshot/operationTime ordering bug (PR #49451 → reverted #53585 → re-landed #53614 in ~30 days)
- is related to
-
SERVER-120304 Snapshot time may advance before fetching a command's operationTime
-
- Closed
-