[SERVER-56768] Race in atomic compareAndSwap can make FailPoint::enableFailPoint() to spin forever Created: 07/May/21  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 4.0.24
Fix Version/s: 4.0 Required

Type: Bug Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

This is not a production bug so threat it accordingly.

The race:
Initially, _fpInfo was ( 1 << 31 ).
1. Thread 1 enters FailPoint::slowShouldFailOpenBlock() and increments the _fpInfo to ( 1 << 31 + 1)
2. Thread 2 enters disableFailPoint() and reads _fpInfo into currentVal
3. Thread 1 enters FailPoint::shouldFailCloseBlock() and decrements _fpInfo to ( 1 << 31 )
4. Thread 2 spins forever

Even though this is not production failure, someone may copy-paste this pattern into production code. Atomics are subtle and must be treated with care.

Fix: load _fpInfo each time inside loop. Same for enableFailPoint().


Generated at Thu Feb 08 05:40:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.