Hang analyzer for WT Evergreen tests

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • WT10.0.0, 4.9.0, 4.4.3
    • Affects Version/s: None
    • Component/s: None
    • Storage - Ra 2020-11-16
    • 3

      Triaging and diagnosing hang failures in automated Evergreen tests should be easier.  After determining that a test is hung, Evergreen should automatically collect and report data that will help with the initial triage and diagnosis of the problem.  Ideally we might collect:

      • What test programs were running at the time of the hang. 
      • The WiredTiger directory for those tests (I believe we already keep this for all tests)
      • Cores of the hung process(es), to help engineers determine why they were hung
      • Stack traces from the hung processes, to include in the Evergreen logs to facilitate triage.

      There is probably other stuff that would be useful as well.

      MongoDB's resmoke.py includes a hang-analyzer that they use for this purpose, buildscripts/resmokelib/commands/hang_analyzer.py.  We might be able to use it as the basis for a WT hang analyzer, or simply steal it outright.

            Assignee:
            Ravi Giri
            Reporter:
            Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: