Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55875

Make the thread liveness monitor to detect the stuck disk I/O

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None

    Description

      This new behavior would help in the HELP ticket incident. While we have the Enterprise Watchdog monitoring the storage health the Community edition mongod primary can be stuck on a faulty drive for hours without stepping down. The Watchdog targets this problem fast, but there is no good story for community edition at all.

      While the Enterprise Watchdog will continue providing premium services, the Enterprise edition will have a more generic slower solution, however still preventing a multi-hour outage. The reaction time will be different by design, maintaining the service differentiation: Watchdog is capable to detect such outage as fast as 10-30 seconds (based on configuration) while the thread liveness monitor will achieve identical result after 5-10 minutes of outage.

      Assigning to shameek.ray to make this blocked on the PM ticket he is creating.

      Attachments

        Activity

          People

            shameek.ray@mongodb.com Shameek Ray
            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: