Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.0-rc0, 4.7.0
Affects Version/s: None
Component/s: Internal Code
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Sprint:
Service arch 2020-04-20
Linked BF Score:
23
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

It is incorrect for CancelRemotelyTimeout to wait for the _killOperations command issued by cancelCommand to be killed when it times out since we don't kill operations without operation keys (i.e. without hedged options). The test rarely fails or hangs since the currentOp check and the _killOperations command both have the timeout of 1 second. However, there is a race between when the _killOperations operation starts and when the currentOp check starts. That is, there is a chance that the currentOp check returns without waiting because there is no _killOperations operation running yet, and so the test fails as the number of timed out commands is zero. To solve this, we should remove the currentOp check and instead use a failpoint to wait for the _killOperations command to timeout. That failpoint should be placed at the end of NetworkInterfaceTL::tryFinish and have a predicate that only returns true when command name and error code match the response status and command name for the request.
The failpoint "networkInterfaceAfterAcquireConn" was added to ensure that cancelCommand does not start running until the command acquires a connection (otherwise, no _killOperations command will be issued). However, it does not check the command name so other commands in the background can also enter this failpoint if they happen to run while the test is in this block. So to avoid this, we should replace the failpoint with a function similar to waitForCommand that runs currentOp repeatedly until there is a matching operation is running.

Assignee:: Cheahuychou Mao
Reporter:: Cheahuychou Mao
Participants:: Cheahuychou Mao, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 06 2020 06:33:50 AM UTC
Updated:: Oct 29 2023 10:09:54 PM UTC
Resolved:: Apr 07 2020 09:46:02 PM UTC
Confidence Status Last Update:: 06/Apr/20 6:40 AM

Details

Description

Attachments

Activity

People

Dates