[SERVER-48146] FailPoint usage is unsafe at static init and shutdown time Created: 12/May/20  Updated: 29/Oct/23  Resolved: 17/Dec/20

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-53409 refresh some names in FailPoint imple... Closed
Backwards Compatibility: Fully Compatible
Sprint: Service arch 2020-12-28
Participants:

 Description   

FailPoint objects are designed to be defined as nonlocal objects, and referred to in arbitrary code paths as e.g. `if (failpoint.shouldFail()) ...` to provide test injection and instrumentation. Calls to FailPoint::shouldFail must to be correct even before the failpoint initializer has run, or after the failpoint has been destroyed.

We can fix it by making FailPoint objects into trivial facade objects using std::aligned_storage that lazily initializes its state when member functions are called.

For performance paths that are known to be lifetime-safe, we can use an accessor to get at the unchecked state of the FailPoint, skipping the atomic initializer guard.



 Comments   
Comment by Githook User [ 17/Dec/20 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-48146 init-safe, shutdown-safe failpoint

invariant on use of pre-init FailPoint objects
Minimize public API surface with Impl.
Some renames to try to align old names with new code.
Branch: master
https://github.com/mongodb/mongo/commit/42dae28cbff75e5e71199bdb6946673fc61b3436

Comment by Billy Donahue [ 10/Dec/20 ]

https://mongodbcr.appspot.com/748170001/

Comment by Billy Donahue [ 17/Nov/20 ]

Candidate for the StaticImmortal wrapper type.

Comment by Mira Carey [ 12/May/20 ]

I think we should consider on the other side simply leaking fail points. The construction side is less of a concern for me (because any code meant to run in the server will already have to wait for the initializer graph (until after we've processed test commands enabled)). The shutdown side is compelling, but I suspect that leaking fail points is an easier to think about solution than requiring fail points to be plain data

Generated at Thu Feb 08 05:16:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.