Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56167

Guarantee hang analyzer collects core dumps for sharded clusters, at minimum

    • Fully Compatible
    • v5.0
    • STM 2021-06-14
    • 164
    • 2

      Attaching gdb and collecting diagnostics for all processes in a sharded cluster continues to time out after 15 minutes. BF-20581 is a recent example where only 6 of the 9 mongod processes were attached to. Server engineers may end up relying on good luck or having access to multiple occurrences to successfully interpret the cause of a hang.

      We should consider 1. reordering the steps in hang analyzer so a core dump can be captured for every mongod process even if the diagnostics against the live process cannot, or 2. we should consider sending a SIGABRT to any process gcore wasn't run on before the 15 minutes expire.

            mikhail.shchatko@mongodb.com Mikhail Shchatko
            robert.guo@mongodb.com Robert Guo (Inactive)
            2 Vote for this issue
            6 Start watching this issue