Use SIGKILL rather than clean shutdown after a failing test

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • DevProd Correctness
    • Fully Compatible
    • 2024-10-29
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Right now, after a test fails we still try to do clean shutdown of all servers in the cluster under test. This results in slower turnaround and a lot of useless log messages after the failures that we need to scroll past to find the failure message. This is particularly bad for tests that use a larger number of servers, like sharding tests. There is also a risk that the servers were left in a state where they hang while shutting down (eg if some failpoints were left active). Instead, once we know we have a definite failure, we should just abort the servers as quickly as possible using SIGKILL.

      I think this applies both to servers launched by the test itself (eg by ShardingTest), as well as for externally managed servers launched by resmoke.

            Assignee:
            Mikhail Shchatko
            Reporter:
            Mathias Stearn
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: