[DRIVERS-1981] Reconsider interaction between srvMaxHosts and SRV polling Created: 10/Nov/21 Updated: 06/Jun/23 |
|
| Status: | Backlog |
| Project: | Drivers |
| Component/s: | Performance, SRV Polling |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Minor - P4 |
| Reporter: | Jeremy Mikola | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | leads-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Driver Changes: | Needed | ||||||||||||
| Description |
SummaryOpening this to record a conversation that occured over Slack regarding srvMaxHosts (DRIVERS-1519). When srvMaxHosts is being used, drivers currently only drop connections if existing hosts are unavailable or not present in the most recently polled SRV records. This leads to "sticky" behavior whereby an application is more likely to keep the hosts it originally selected even as new hosts are added to a cluster over time (e.g. Atlas cluster scales up). jeff.yemin's proposal was to reduce the stickiness and allow existing mongos connections to be exchanged more frequently:
To avoid unnecessary churn on each polling interval, Jeff also suggested storing a snapshot of the most recent SRV results so that reshuffling need only happen if the SRV results have changed (i.e. mongos hosts are added or removed from the cluster). To present both sides of the argument, james.kovacs's response follows:
MotivationWho is the affected end user?Customers using srvMaxHosts in long-running applications. How does this affect the end user?As the number of mongoses in a cluster changes over time, app servers using srvMaxHosts might stick to their originally selected mongos hosts. Across the entire application, this could lead to an imbalance in connections to mongos hosts. How likely is it that this problem or use case will occur?This will occur in applications where the cluster expands or contracts in size (i.e. mongoses are added or removed and SRV records are updated). If the problem does occur, what are the consequences and how severe are they?There may be a performance concern where some mongoses will retain more connections/utilization than others. Is this issue urgent?No. This ticket is being opened to record a Slack discussion in the event that we need to revisit this feature down the line. Is this ticket required by a downstream team?No. Is this ticket only for tests?No. |
| Comments |
| Comment by Jeffrey Yemin [ 10/Nov/21 ] |
Can you clarify what unavailable means in this context? |