[DRIVERS-2364] Only Process Last Monitor Heartbeat When Using Streaming Protocol Created: 21/Jun/22  Updated: 25/Sep/23

Status: Backlog
Project: Drivers
Component/s: FaaS, SDAM
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Durran Jordan Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates DRIVERS-2578 Switch to polling monitoring when run... Implementing
Related
related to DRIVERS-2578 Switch to polling monitoring when run... Implementing
Driver Changes: Needed

 Description   

Summary

When the driver monitors for each connected node in the cluster are using the streaming protocol, they must only ever process the most recent hello response from the node and ignore all previous unprocessed hello responses.

Motivation

In FaaS environments, such as AWS Lambda, when the handler functions are idle their execution environment is put to sleep or frozen while network connections that have been established to the nodes in the cluster remain open. Since the nodes are unaware that the driver is in this state, they continue to send hello responses according to the SDAM streaming protocol and these build up in the incoming TCP buffer for the execution environment. When the handler function is invoked again, single threaded drivers, such as Node, process all the incoming hello responses first before any of the handler function can be executed. In some cases, this processing can take up a significant amount of time. In order to help combat this lag, drivers may choose to ignore all incoming hello responses on the incoming TCP buffer until the most recent one present.

Who is the affected end user?

Users of single threaded drivers, such as Node. This may not impact multi-threaded drivers as their monitoring connections happen on another thread, but they may choose to implement this since the most recent hello response is always the source of truth.

How does this affect the end user?

Performance issues on functions that have been idle for a significant amount of time, or functions that have been warmed up ahead of time before they are first executed. (Example: Lambda Provisioned Concurrency)

How likely is it that this problem or use case will occur?

We have already had several related tickets to this happen with Node Lambda users.

If the problem does occur, what are the consequences and how severe are they?

Severe performance issues on first execution of an idle function.

Is this issue urgent?

Unknown for other drivers outside Node, but Node has already implemented this in driver 4.7.0.

Is this ticket required by a downstream team?

No.

Is this ticket only for tests?

No.


Generated at Thu Feb 08 08:25:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.