Uploaded image for project: 'Evergreen'
  1. Evergreen
  2. EVG-15022

Agent host can start but host is building

    XMLWordPrintable

Details

    • Task
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • current_quarter
    • plt

    Description

      When we restart the app servers, any EC2 fleet requests in progress will be cancelled in the create host job. However, the EC2 fleet requests might still succeed in starting the EC2 host. This puts us in a situation where the agent can start and check into the app server, but the app server still believes the host is in the "building" state.

      • If the agent is up but the host is in the "building" state, we shouldn't believe the EC2 fleet result and should instead set the host to "starting" if it's currently "building". Otherwise, the agent will just check in forever unsuccessfully since it doesn't get terminated by the app server (the reaper will get it eventually, but we shouldn't lean on the reaper to clean up hosts for us).
      • As an additional safety net, if the host is in the "building" state, the host allocator should not simply delete the host document. It should send it to the host termination job and the host termination job should deal with the possibility that the host is actually alive so it can be properly cleaned up.

      Attachments

        Issue Links

          Activity

            People

              kimberly.tao@mongodb.com Kim Tao
              kimberly.tao@mongodb.com Kim Tao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: