Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Workload Scheduling
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Workload Scheduling 2024-07-22, Workload Scheduling 2024-08-05
Linked BF Score:
200
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The test case

GetKeyReadConcernMajorityNotAvailableYetDeadline expects that we hit the failpoint "keyRefreshFailWithReadConcernMajorityNotAvailableYet" twice. This is expected because we expect one failure due to the initial refresh on the monitoring thread kicked off by startMonitoring failing, and a second one due to the additional thread that calls getKeysForValidation prompting a wake-up and a second hit of the failpoint.

However, it's possible that the additional thread's getKeysForValidation calls refreshNow and sets refreshRequest _before the initial refresh hits the failpoint. In this case, the _refreshRequest enqueued by the additional thread will be consumed as part of the initial monitoring refresh, so nothing will wake the monitoring thread up to hit the failpoint a second time (the main thread waits to advance the clock until the failpoint is hit twice, since it expects both the initial refresh and the getKeysForValidation-prompted refresh to fail separately; it doesn't correctly consider the interleaving where the getKeysForValidation-prompted refresh is condensed into the additional refresh).

In other words the problematic interleaving is:

Monitoring thread	Unittest-spawned thread	Unittest main thread
	Call getKeysForValidation
	Call refreshNow
	set _refreshRequest
Call doPeriodicRefresh
Consume _refreshRequest
Hit failpoint, retry
Sleep in waitForConditionOrInterrupt waiting for additional request or timeout
		Wait for failpoint to be hit twice (hang here as has only been hit once)
		Advance clock

The issue can be reproed by adding a sleep of 500 milliseconds to the monitoring thread before it calls _doPeriodicRefresh in KeysCollectionManager:: Periodicrunner::start

Assignee:: George Wangensteen (Inactive)
Reporter:: George Wangensteen (Inactive)
Participants:: George Wangensteen, Githook User
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Jul 22 2024 03:55:42 PM UTC
Updated:: Jul 25 2024 02:58:43 PM UTC
Resolved:: Jul 25 2024 02:58:42 PM UTC
Confidence Status Last Update:: 22/Jul/24 3:57 PM

Details

Description

Attachments

Activity

People

Dates