Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6449

Hang analyzer for WT Evergreen tests

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.9.0, 4.4.3
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • 3
    • Storage - Ra 2020-11-16

      Triaging and diagnosing hang failures in automated Evergreen tests should be easier.  After determining that a test is hung, Evergreen should automatically collect and report data that will help with the initial triage and diagnosis of the problem.  Ideally we might collect:

      • What test programs were running at the time of the hang. 
      • The WiredTiger directory for those tests (I believe we already keep this for all tests)
      • Cores of the hung process(es), to help engineers determine why they were hung
      • Stack traces from the hung processes, to include in the Evergreen logs to facilitate triage.

      There is probably other stuff that would be useful as well.

      MongoDB's resmoke.py includes a hang-analyzer that they use for this purpose, buildscripts/resmokelib/commands/hang_analyzer.py.  We might be able to use it as the basis for a WT hang analyzer, or simply steal it outright.

            ravi.giri@mongodb.com Ravi Giri
            keith.smith@mongodb.com Keith Smith
            0 Vote for this issue
            5 Start watching this issue