Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56167

Guarantee hang analyzer collects core dumps for sharded clusters, at minimum

    XMLWordPrintableJSON

Details

    • Fully Compatible
    • v5.0
    • STM 2021-06-14
    • 164
    • 2

    Description

      Attaching gdb and collecting diagnostics for all processes in a sharded cluster continues to time out after 15 minutes. BF-20581 is a recent example where only 6 of the 9 mongod processes were attached to. Server engineers may end up relying on good luck or having access to multiple occurrences to successfully interpret the cause of a hang.

      We should consider 1. reordering the steps in hang analyzer so a core dump can be captured for every mongod process even if the diagnostics against the live process cannot, or 2. we should consider sending a SIGABRT to any process gcore wasn't run on before the 15 minutes expire.

      Attachments

        Issue Links

          Activity

            People

              mikhail.shchatko@mongodb.com Mikhail Shchatko
              robert.guo@mongodb.com Robert Guo (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: