Uploaded image for project: 'Evergreen'
  1. Evergreen
  2. EVG-13967

user-data-done job failed for hours

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: next_quarter
    • Component/s: plt
    • Labels:
      None

      Description

      This host was in the provisioning state forever waiting to be marked done by the userdata done job. The job, in turn, errored for hours.
      The cycle only ended when a user manually terminated the spawnhost from the UI.

      Questions:
      1) why does the context keep getting cancelled after 15 seconds?
      2) Why did it happen specifically with this host?
      3) should we terminate the host ourselves after some number of attempts or a duration?

        Attachments

          Activity

            People

            Assignee:
            kimberly.tao Kim Tao
            Reporter:
            jonathan.brill Jonathan Brill
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: