Summary
When the driver monitors for each connected node in the cluster are using the streaming protocol, they must only ever process the most recent hello response from the node and ignore all previous unprocessed hello responses.
Motivation
In FaaS environments, such as AWS Lambda, when the handler functions are idle their execution environment is put to sleep or frozen while network connections that have been established to the nodes in the cluster remain open. Since the nodes are unaware that the driver is in this state, they continue to send hello responses according to the SDAM streaming protocol and these build up in the incoming TCP buffer for the execution environment. When the handler function is invoked again, single threaded drivers, such as Node, process all the incoming hello responses first before any of the handler function can be executed. In some cases, this processing can take up a significant amount of time. In order to help combat this lag, drivers may choose to ignore all incoming hello responses on the incoming TCP buffer until the most recent one present.
Who is the affected end user?
Users of single threaded drivers, such as Node. This may not impact multi-threaded drivers as their monitoring connections happen on another thread, but they may choose to implement this since the most recent hello response is always the source of truth.
How does this affect the end user?
Performance issues on functions that have been idle for a significant amount of time, or functions that have been warmed up ahead of time before they are first executed. (Example: Lambda Provisioned Concurrency)
How likely is it that this problem or use case will occur?
We have already had several related tickets to this happen with Node Lambda users.
If the problem does occur, what are the consequences and how severe are they?
Severe performance issues on first execution of an idle function.
Is this issue urgent?
Unknown for other drivers outside Node, but Node has already implemented this in driver 4.7.0.
Is this ticket required by a downstream team?
No.
Is this ticket only for tests?
No.
- duplicates
-
DRIVERS-2578 Switch to polling monitoring when running within a FaaS environment
- Implementing
- related to
-
DRIVERS-2578 Switch to polling monitoring when running within a FaaS environment
- Implementing