Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cs-bf-external

Assigned Teams:

Service Arch
Operating System:
ALL
Sprint:
Workload Scheduling 2024-07-22
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

BF-33912's lone BFG at the time of writing appears to have been caused by some replication error in a TSAN variant (which is quite slow) leading to a host being down for longer than usual (see the comments here for details).

This caused the client threads to receive a mix of "Connection reset by peer," "Connection refused," and "HostUnreachable" errors, but only HostUnreachable is considered a retriable error that will not consume the retry limit.

In suites where we kill/terminate shard processes, it should be expected to receive network errors more frequently (and that they should be transient).

Assignee:: George Wangensteen (Inactive)
Reporter:: Brett Nawrocki
Participants:: Brett Nawrocki, George Wangensteen
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jul 03 2024 06:30:11 PM UTC
Updated:: Jul 15 2024 07:30:45 PM UTC
Resolved:: Jul 15 2024 07:30:44 PM UTC

Details

Description

Attachments

Activity

People

Dates