Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
None
-
None
-
None
Description
AWS Lambda (and likely other FaaS services) will pause the app process when it's idle and resume it later on demand (when a new request comes in). This pause/resume behavior causes SDAM heartbeats to timeout which then clears the pool and marks the server Unknown. This causes connection churn and increased latency since the servers need to be rediscovers and all pooled connections need to be recreated.
This behavior can be simulated locally using SIGSTOP + SIGCONT:
2022-03-25 14:40:38,915 INFO event_loggers Heartbeat sent to server ('localhost', 27018)
|
2022-03-25 14:40:38,916 INFO event_loggers Heartbeat sent to server ('localhost', 27019)
|
[1] + 93208 suspended (signal) python repro-DRIVERS-2246.py
|
$ sleep 60
|
$ kill -SIGCONT 93208
|
2022-03-25 14:42:16,835 WARNING event_loggers Heartbeat to server ('localhost', 27017) failed with error localhost:27017: timed out
|
2022-03-25 14:42:16,835 WARNING event_loggers Heartbeat to server ('localhost', 27018) failed with error localhost:27018: timed out
|
2022-03-25 14:42:16,836 INFO event_loggers Heartbeat sent to server ('localhost', 27017)
|
2022-03-25 14:42:16,836 INFO event_loggers Heartbeat sent to server ('localhost', 27018)
|
2022-03-25 14:42:16,836 WARNING event_loggers Heartbeat to server ('localhost', 27019) failed with error localhost:27019: timed out
|
2022-03-25 14:42:16,837 INFO event_loggers Heartbeat sent to server ('localhost', 27019)
|
We can mitigate this issue by performing one non-blocking check to see if the socket is readable after a timeout:
2022-03-29 15:24:52,344 INFO event_loggers Heartbeat sent to server ('localhost', 27017)
|
[1] + 30988 suspended (signal) python3.10 repro-DRIVERS-2246.py
|
$ sleep 30 && kill -SIGCONT 30988
|
2022-03-29 15:25:37,944 INFO event_loggers Heartbeat to server ('localhost', 27017) succeeded with reply {'topologyVersion': ...
|
Attachments
Issue Links
- causes
-
PYTHON-3191 Test Failure - Versioned API requireApiVersion1
-
- Closed
-
- is related to
-
PYTHON-2448 TLS handshake fails sometimes when running on AWS Lambda
-
- Closed
-
- related to
-
DRIVERS-1598 Solve for serverless/lambda connection pool issues
-
- Closed
-
-
DRIVERS-2246 Heartbeat build up with streaming protocol when driver process is stopped (FAAS)
-
- Closed
-