[SERVER-56762] FailPoint::setMode() could be blocked for hours, making tests to time out Created: 07/May/21 Updated: 27/Oct/23 Resolved: 11/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Service Arch
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We have a use case to make the fail point to suspend a thread for over an hour. However the loop in FailPoint::setMode() will block until the fail point block exits. Repro: while running this test the loop did 42,095 iterations and the test eventually times out. The fail point was configured to block the Hello command processing thread for 100 minutes. I think the purpose of this spin wait is to prevent test flakiness by eliminating subtle races. For this purpose the infinite wait is not necessary. I propose to limit the wait for 1 minute, this is long enough for all race cases. This is a blocker for submitting integration tests for HELP ticket, so I will propose a fix. |
| Comments |
| Comment by Andrew Shuvalov (Inactive) [ 11/May/21 ] |
|
Not necessary |
| Comment by Andrew Shuvalov (Inactive) [ 07/May/21 ] |
|
Note: current head has pauseWhileSet(). Do you want me to port it to 4.0 branch instead of proposed solution? |