[SERVER-54586] Failpoints are no longer free when not enabled Created: 17/Feb/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | sa-remove-fv-backlog-22 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Service Arch
|
| Participants: | |
| Story Points: | 4 |
| Description |
|
One of the defining purposes of failpoints is that they should be free when not enabled, in order to make it reasonable to liberally use them even in fairly hot codepaths. There are at least two issues adding unnecessary costs to disabled failpoints:
In addition to solving these, we should evaluate what it would take to give failpoint a constexpr constructor (so that it is created during const-init rather than dynamic-init), and ideally a trivial destructor, so that it never gets destroyed. |
| Comments |
| Comment by Billy Donahue [ 27/Apr/21 ] | |||||||||||||||||||||||||
|
Just a note that the commit Sending the ticket back to "Open" status rather than closing it fixed, just because I think there's more work here than just writing a benchmark. that was just to baseline the problem. | |||||||||||||||||||||||||
| Comment by Githook User [ 27/Apr/21 ] | |||||||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@mongodb.com', 'username': 'puppyofkosh'}Message: SERVER-54586 FailPoint benchmark | |||||||||||||||||||||||||
| Comment by Billy Donahue [ 20/Apr/21 ] | |||||||||||||||||||||||||
|
Benchmark: Doesn't really show a cost beyond that of an atomic load.
| |||||||||||||||||||||||||
| Comment by Billy Donahue [ 19/Feb/21 ] | |||||||||||||||||||||||||
|
Notes from Slack discussion: Regarding constexpr and early accessor checks:
Regarding inlining the fast path:
| |||||||||||||||||||||||||
| Comment by Billy Donahue [ 19/Feb/21 ] | |||||||||||||||||||||||||
|
In But I also added an _impl() call in front of most accessors of interior state, and it looks like that is just too much to ask for fast paths according to this ticket. A trivial destructor was proposed above but FailPoint already has a minimal destructor, as the main guts are moved into the _impl, which is made with placement new and not destroyed in production failpoints. This which was important for The fast path can be made faster again by making an atomic bool that lives outside the _impl(), so you don't have to call _impl() first to read the enable/disable state therein. I didn't think that FailPoints were so super-performant that we can't make a function call and come back. It's clear that the FailPoint API with all its perf-jockeying needs a benchmark if we're going to keep that level of performance up. It would be nice to put before-and-after numbers up when making a change. The point about inlining I didn't understand, though. "And because the out-of-line function contains the whole code path rather than just the fast check, it needs to do all of the expensive prolog and epilog, including stack-smash protection." It sounds like there's been real impact observed, but I didn't understand how the length of the inlined function matters to the prolog/epilog, as long as it has an early return. That's interesting. I tried to make a constexpr ctor in an effort to make them guaranteed init-safe but I think I couldn't figure out how to do it. I think some of the ideas I had at the time required users to access the failpoint with deref syntax and I just didn't want to go there at the time. I don't remember. |