[SERVER-48146] FailPoint usage is unsafe at static init and shutdown time Created: 12/May/20 Updated: 29/Oct/23 Resolved: 17/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | servicearch-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Service arch 2020-12-28 | ||||||||
| Participants: | |||||||||
| Description |
|
FailPoint objects are designed to be defined as nonlocal objects, and referred to in arbitrary code paths as e.g. `if (failpoint.shouldFail()) ...` to provide test injection and instrumentation. Calls to FailPoint::shouldFail must to be correct even before the failpoint initializer has run, or after the failpoint has been destroyed. We can fix it by making FailPoint objects into trivial facade objects using std::aligned_storage that lazily initializes its state when member functions are called. For performance paths that are known to be lifetime-safe, we can use an accessor to get at the unchecked state of the FailPoint, skipping the atomic initializer guard. |
| Comments |
| Comment by Githook User [ 17/Dec/20 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: invariant on use of pre-init FailPoint objects |
| Comment by Billy Donahue [ 10/Dec/20 ] |
| Comment by Billy Donahue [ 17/Nov/20 ] |
|
Candidate for the StaticImmortal wrapper type. |
| Comment by Mira Carey [ 12/May/20 ] |
|
I think we should consider on the other side simply leaking fail points. The construction side is less of a concern for me (because any code meant to run in the server will already have to wait for the initializer graph (until after we've processed test commands enabled)). The shutdown side is compelling, but I suspect that leaking fail points is an easier to think about solution than requiring fail points to be plain data |