-
Type: Spec Change
-
Resolution: Won't Do
-
Priority: Unknown
-
None
-
Component/s: Load Balancer, SDAM, Serverless
-
None
-
Not Needed
Summary
While investigating CLOUDP-104364, it was mentioned that the Atlas serverless proxy only outputs serviceId for the initial handshake, which is determined by the existence of a client field in the hello command.
This uncovered a subtle bug in libmongoc (CDRIVER-4207). Errors and timeouts encountered during monitoring would typically result in libmongoc constructing a handshake command when reconnecting; however, errors/timeouts encountered during application usage do not. In single-threaded SDAM, the monitoring and application sockets are one and the same.
Some suggestions for this ticket:
- If not already noted in the SDAM spec, single-threaded implementations consider monitoring and application errors the same for purposes of sending a handshake hello during reconnection.
- Consider noting the Atlas serverless proxy behavior in some drivers specification. I realize we don't have a serverless spec (just Serverless Testing), but perhaps this warrants a note in either the SDAM or Handshake specs.
Motivation
Who is the affected end user?
Single-threaded SDAM implementations.
How does this affect the end user?
Users connected to Atlas Serverless that encounter an application error/timeout might get blocked by client-side load balancer errors due to a missing serviceId in subsequent hello responses.
How likely is it that this problem or use case will occur?
This may occur after any application error/timeout in a single-threaded driver connected to Atlas Serverless.
If the problem does occur, what are the consequences and how severe are they?
Application will likely be blocked for the lifetime of the MongoClient.
Is this issue urgent?
The spec change itself is not urgent and does not block libmongoc from fixing its bug.
Is this ticket required by a downstream team?
No.
Is this ticket only for tests?
No, but it does affect serverless testing since our spec tests trigger socket errors via fail points (see: PHPLIB-717).
- is related to
-
CDRIVER-4207 mongoc_topology_scanner_node_t.last_failed ignores errors outside of monitoring for singled-threaded SDAM
- Closed
-
PHPLIB-717 Test Serverless behind a load balancer to prevent test breakage
- Closed