Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6449

Develop (or steal) hang analyzer for WT Evergreen tests

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Needs Scheduling
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Triaging and diagnosing hang failures in automated Evergreen tests should be easier.  After determining that a test is hung, Evergreen should automatically collect and report data that will help with the initial triage and diagnosis of the problem.  Ideally we might collect:

      • What test programs were running at the time of the hang. 
      • The WiredTiger directory for those tests (I believe we already keep this for all tests)
      • Cores of the hung process(es), to help engineers determine why they were hung
      • Stack traces from the hung processes, to include in the Evergreen logs to facilitate triage.

      There is probably other stuff that would be useful as well.

      MongoDB's resmoke.py includes a hang-analyzer that they use for this purpose, buildscripts/resmokelib/commands/hang_analyzer.py.  We might be able to use it as the basis for a WT hang analyzer, or simply steal it outright.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              brian.lane Brian Lane
              Reporter:
              keith.smith Keith Smith
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: