[SERVER-56762] FailPoint::setMode() could be blocked for hours, making tests to time out Created: 07/May/21  Updated: 27/Oct/23  Resolved: 11/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Backlog - Service Architecture
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

We have a use case to make the fail point to suspend a thread for over an hour. However the loop in FailPoint::setMode() will block until the fail point block exits.

Repro: while running this test the loop did 42,095 iterations and the test eventually times out. The fail point was configured to block the Hello command processing thread for 100 minutes.

I think the purpose of this spin wait is to prevent test flakiness by eliminating subtle races. For this purpose the infinite wait is not necessary. I propose to limit the wait for 1 minute, this is long enough for all race cases.

This is a blocker for submitting integration tests for HELP ticket, so I will propose a fix.



 Comments   
Comment by Andrew Shuvalov (Inactive) [ 11/May/21 ]

Not necessary

Comment by Andrew Shuvalov (Inactive) [ 07/May/21 ]

Note: current head has pauseWhileSet(). Do you want me to port it to 4.0 branch instead of proposed solution?

Generated at Thu Feb 08 05:40:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.