Improve performance monitoring and alerting

XMLWordPrintableJSON

    • Perf monitoring
    • Go Drivers
    • Not Needed
    • To Do
    • 0
    • 0
    • 0
    • 100
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      Make the Go Driver performance monitoring reliable, and make the performance alerts actionable.

      Currently, we don't trust that the performance monitoring reliably points out meaningful changes, and we typically ignore performance monitoring alerts. We should improve the performance monitoring so we trust it to tell us something useful, and make the alerts actionable so we can do something when we get alerted.

      Motivation

      Who is the affected end user?

      Go Driver devs directly. Go Driver users eventually.

      How does this affect the end user?

      They don't know about Go Driver performance regressions. Go Driver users may find performance regressions before the dev team does, leading to a bad experience where they have to complain about the performance regressions.

      How likely is it that this problem or use case will occur?

      We intermittently get users complaining about Go Driver performance regressions, especially when we make big changes (e.g. Go Driver v2 release). There's not a strong pattern for when performance regressions occur, but there's at least one valid complaint every 6 months (usually increased memory usage or CPU usage).

      If the problem does occur, what are the consequences and how severe are they?

      The consequences to Go Driver devs is usually lost productivity and lowered morale. We typically get tickets about performance regressions months after we've made changes that might lead to a performance regression, so we've lost all context on the change. The interruptions are bad for productivity and hurt morale.

      The conesquences to Go Driver users may be confusion, increased resource usage, increased cost, and ultimately a worse customer experience.

      Is this issue urgent?

      No.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No.

      Cast of Characters

      Engineering Lead:
      Document Author:
      POCers:
      Product Owner:
      Program Manager:
      Stakeholders:

      Channels & Docs

      Slack Channel

      [Scope Document|some.url]

      [Technical Design Document|some.url]

            Assignee:
            Unassigned
            Reporter:
            Matt Dale
            None
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None
              None
              None