Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
Execution Team 2021-06-14, Execution Team 2021-06-28, Execution Team 2021-07-26, Execution Team 2021-08-23, Execution Team 2021-09-06, Execution Team 2021-09-20, Execution Team 2021-10-04, Execution Team 2021-11-01, Execution Team 2021-11-15, Execution Team 2022-02-21, Execution Team 2022-04-04, Execution Team 2022-05-16, Execution Team 2022-05-30
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

~~SERVER-57476~~ demonstrates a case where:

A transaction T1 becomes prepared, preparing some update A
Another transaction T2 reserves an oplog slot. This slot has an earlier timestamp than the prepare oplog entry of T1.
T1 cannot commit its transaction until it is replicated to a majority of nodes. The oplog hole introduced by T2 prevents this from majority replicating.
T2 attempts to read the document that's currently prepared with A.
This introduces a stall in the system.

~~SERVER-57476~~ plans to address the problem by returning a retryable error to the user when a transaction with a commit timestamp actually hits a prepare conflict. This targets the global problem. That fix will have no effect if a system doesn't have the requisite interleaving illustrated by T1 and T2.

This ticket is to craft a set of criteria local to a single operation to know when it may lead to the described stall. This is important because it's difficult for our system to thoroughly generate all combinations of operations that can bring out this interleaving.

However, there are challenges. It's not sufficient to simply invariant that anything entering a preparedConflictRetry loop must also not be holding any resources (i.e: have a commit/durable timestamp):

Entering a prepareConflictRetry is safe on primaries when the operation has exclusive access to a collection.
The transaction may be ignoring prepare conflicts.
The system may be in a state (e.g: startup or rollback) where prepared transactions do not currently exist.

is related to

SERVER-57476 Operation may block on prepare conflict while holding oplog slot, stalling replication indefinitely

Closed

Assignee:: Gregory Noma
Reporter:: Daniel Gottlieb (Inactive)
Participants:: Daniel Gottlieb, Gregory Noma
Votes:: 0 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Jun 07 2021 06:13:19 PM UTC
Updated:: May 23 2022 03:22:51 PM UTC
Resolved:: May 23 2022 03:22:50 PM UTC
Confidence Status Last Update:: 03/May/22 6:48 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates