[SERVER-61692] Reproduce connection storm behavior and try mitigating it with a load balancer Created: 22/Nov/21  Updated: 09/Mar/22  Resolved: 09/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: George Wangensteen
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-61693 Add connection storm perf workload to... Open
Sprint: Service Arch 2022-2-07, Service Arch 2022-2-21, Service Arch 2022-03-07, Service Arch 2022-03-21
Participants:
Story Points: 10

 Description   

We should write an ad hoc test that reproduces connection storms, and then try running the same tests with an L4 load balancer in between clients and mongos to see whether using a load balancer prevents a connection storm, as we intend it to. As part of this we should:

  • Define what exact workload we expect to reproduce a connection storm, along with the expected symptoms of a connection storm. This should possibly look at previous HELP tickets and or customer issues, and may involve talking to TSEs. My best current understanding is that load balancers are intended to help with a scenario where the number of app servers rapidly scales up, with a minimum connection pool size set to some non-zero value. One motivating example might be the example detailed in this blog post.
  • Try to reproduce that scenario in the easiest way possible - at this point we do not care about getting the reproducer into our continuous integration suites. That will be done as follow-on work in SERVER-61693
  • Once we're able to reliably reproduced the connection storm behavior, run the exact same workload but with an L4 load balancer deployed - probably something like Elastic Load Balancer on AWS - and see if the issue goes away, and if not, document the behavior

As we do this, we should document (either in the ticket or in a google doc) our progress and the steps we've taken to get there.


Generated at Thu Feb 08 05:53:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.