[SERVER-53380] Improve assert_util classes and implementation Created: 15/Dec/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Tyler Seip (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 1
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-55449 Add a tassertStatusOK() macro to asse... Closed
is related to SERVER-53135 Tassert should only exit with error i... Closed
Assigned Teams:
Service Arch
Participants:

 Description   

In SERVER-53135, a number of improvements were identified by billy.donahue, outlined primarily inĀ this CR. A short (non-exhaustive) summary of the proposed changes:

  • Remove the constructor for AssertionCount (it requires no special initialization and we are currently erroneously missing tripwire initialization)
  • Add tripwire to the rollover logic
  • Add proper methods to AssertionCount to increment and handle rollovers of each sub-AtomicWord
  • Keep a ring buffer of tripwire occurrences and dump them again upon checking them
  • Dump tripwire occurrences during the myTerminate termination handler


 Comments   
Comment by Billy Donahue [ 09/Mar/21 ]

So far everything on this ticket has been about tassert and AssertionCount.
There are more issues that came up during https://mongodbcr.appspot.com/749640009/, and this comment captures feedback from that effort and review.

It would be good to unify the assert macros through a central MONGO_BASE_ASSERT_VA_DISPATCH.
So the different macros vary only in which function name handles their failures.
Then the whole API which is notoriously jagged, would be much easier to teach and maintain.

SERVER-55017 unifies iassert and tassert. A speculative dev branch brought massert and uassert into that same umbrella.
fassert might be too special to unify, but that's ok.

There are numerous small things like:

  • dassert being unsafe when followed by an else.
  • MONGO_COMPILER_NOINLINE on functions that aren't inlineable because they're in the .cpp file?
  • Unclear relationship between assert_util.h to invariant.h. Renaming invariant.h -> assert_util_core.h would help?
  • Use SourceLocation instead of _FILE and LINE_ everywhere. Remove "WithLocation" suffixes on function names. This is implied by the SourceLocation parameter now that it's a type.
  • Document the macros very carefully in the header. Especially important since they are variadic.
  • Group all the causedBy declarations together and dedupe them.
  • MONGO_verify uses a language-reserved name `_Expression`, rename it.

Some of these are addressed by this PoC:
https://github.com/mongodb/mongo/compare/master...BillyDonahue:SERVER-55017_advanced

Comment by Billy Donahue [ 09/Mar/21 ]

A few thoughts here.

  • The best thing would be to just upgrade AssertionCount to 64-bit counters, if we can swing that. Forget rollover.
  • Failing that, we never want tripwire count to go back down to zero or it loses its magic, so we have to be careful there.
    Tripwire is different. We log its absolute count. This causes a problem. We need a tassert count since the last rollover for reporting to serverstatus, but then a different absolute number (64-bits) for the "occurrences" in the tripwire shutdown message.
  • Is it possible to remove the "warning" element which corresponds to the removed wassert macro and is always zero now?
  • Maybe mutex the whole AssertionCount object instead of fiddling with 5 individual atomics (that are all sharing a cache line).
Comment by Benjamin Caimano (Inactive) [ 21/Dec/20 ]

I think perhaps the biggest difference would be that you would still potentially mark the suite and test that encountered the tripwire as without failure. Devs tend to be rightfully frustrated when they experience "failure" after being told there was "success". We also need to reset the counts for the tripwire tests themselves, but we can make a specific fixture for that.

Comment by Billy Donahue [ 16/Dec/20 ]

I believe the general concern was that it would be difficult to associate tripwire failures with individual suites given how big some of our linked unittests are.

It would not be difficult unless I'm missing something here.

Here's a piece of util_test on some evergreen run:

[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23063,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"suite":"DurationComparisonSameType"}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"CompareEqual","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"CompareNotEqual","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"SameTypeCompareGreaterThan","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"CompareLessThan","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.837+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"CompareGreaterThanOrEqual","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.838+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23059,   "ctx":"DiagnosticInfoTest","msg":"Running","attr":{"test":"CompareLessThanOrEqual","rep":1,"reps":1}}
[cpp_unit_test:util_test] 2020-12-16T00:29:55.838+0000 {"t":{"$date":"2020-12-16T00:29:55.834Z"},"s":"I",  "c":"TEST",     "id":23060,   "ctx":"DiagnosticInfoTest","msg":"Done running tests"}
 
...

The tripwire failure emits a log line with a conspicuous id number that would appear in the unit test output.
Its analysis exit handler would direct the user to look for those log statements.
The closest preceding id:23059 message would identify the test that was running when the tassert fired.

Comment by Benjamin Caimano (Inactive) [ 15/Dec/20 ]

I believe the general concern was that it would be difficult to associate tripwire failures with individual suites given how big some of our linked unittests are. For better or worse, it did slip past my initial review. I noticed it again when I was evaluating the potential for a backport.

Comment by Billy Donahue [ 15/Dec/20 ]

Yeah it's really important that the unittest::Test base class setUp and tearDown should not be doing any work at all.

That should not have passed review, really.

Is it really necessary to check for tripwireAssertions after each TEST/TEST_F?
The test process would abort and prevent other viable tests from running.
The check should move to unittest_main, IMO.

Really it could just be an atexit lazy-added from the tassert.
Then unittest/ doesn't have to know anything about it.

Comment by Benjamin Caimano (Inactive) [ 15/Dec/20 ]

If we're doing a laundry list, we should probably move the tripwire assertion updates in our unittests here to the unittest::Test ctor/dtor instead. People tend to override those functions which would thus disable the tripwire checks.

Generated at Thu Feb 08 05:30:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.