Major - P3
Changes to serverStatus and FTDC output – once this project is closed we will share full details.
Description of Linked Ticket
Exposing metrics to understand when connection storms happen, how they look like, and the impact they have to our clusters. By the end of this ticket, the following questions will be answered:
- How often does a customer experience a connection storm?
- When a connection storm happens, what does it look like?
- How many applications are involved, how many open connections are there per IP, what is the rate at which new connections are created?
- How often do operations wait for connection establishment?
- How often do connection storm mitigation actions (maxConnecting) happen?
- What connection settings do customers use?
- How long does connection establishment take?
Connection storms are a well known problem across our customer base, but right now, our understanding of this problem is funded through customer anecdotes and support cases. This project will help us understand fleetwide how this problem takes shape, what different factors influence connection storms, which mitigation techniques are helpful, and whether new features as part of Operational Resilience are making an impact for our customers.
Cast of Characters
- Product Owner:
- Project Lead:
- Program Manager:
- Drivers Contact:
Technical Design Document