Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- car-product-sync

Assigned Teams:

Catalog and Routing
CAR Domain/s:

🟩 Routing and Topology

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the course of developing authoritative shards we discovered that there is a potential problem with the protocol that requires us to resolve a split-brain scenario.

In the past we've chosen to avoid this by waiting until a given known majority-available timestamp is available on the node for use. However, this has its own set of issues such as only resolving this situation once the split-brain scenario is solved and causing an unavailability problem until then.

Another way we've handled this in the past is by adding a timeout mechanism which would allow the caller to make a decision on whether to wait more or to abort and retry with a different node its operation. However, this has never been standardized and we thus have a different timeout setting per location that needs to address this, leading to a proliferation of server parameters.

Ideally we would want to have a single timeout that is used across the codebase in order to resolve the split-brain scenario.

This ticket is about finding locations that use such timeouts in order to resolve split-brain scenarios or any equivalent situation that needs to wait for a given node to reach a majority timestamp and to decide how to unify them into a single place.

Assignee:: Unassigned
Reporter:: Jordi Olivares Provencio
Participants:: Jordi Olivares Provencio
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Mar 05 2026 03:42:33 PM UTC
Updated:: May 22 2026 10:37:46 AM UTC

Details

Description

Attachments

Activity

People

Dates