[DRIVERS-2568] Zero connection overhead SDAM Created: 03/Mar/23  Updated: 25/Sep/23

Status: Backlog
Project: Drivers
Component/s: CMAP, FaaS, Performance, SDAM
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Shane Harvey Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: FY24Q4, maintainers-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Driver Changes: Needed

 Description   

Summary

Streaming SDAM mode requires 2 extra connections per server: one for streaming hello, one for RTT measurements.
Polling SDAM mode requires 1 extra connection per server.

We could get Polling SDAM to be zero connection overhead (in the best case assuming no pool contention) if SDAM and application operations would share a single pooled connection. The initial connection sequence would be:

  • Monitor creates unauthenticated connection, runs hello handshake, discovers server, checks connection back into the pool.
  • Application thread checks connection out of the pool, runs auth handshake, executes operation, checks connection back into the pool.
  • Monitor wakes for next check and reuses the pooled connection.

In scenarios where the application uses a single thread with many processes (eg AWS Lambda) this change would allow 200% more MongoClients vs Streaming SDAM and 100% more MongoClients vs Polling SDAM.

Motivation

Who is the affected end user?

Who are the stakeholders?

How does this affect the end user?

Are they blocked? Are they annoyed? Are they confused?

How likely is it that this problem or use case will occur?

Main path? Edge case?

If the problem does occur, what are the consequences and how severe are they?

Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

Is this issue urgent?

Does this ticket have a required timeline? What is it?

Is this ticket required by a downstream team?

Needed by e.g. Atlas, Shell, Compass?

Is this ticket only for tests?

Does this ticket have any functional impact, or is it just test improvements?



 Comments   
Comment by Shane Harvey [ 28/Mar/23 ]

Some interesting things to consider:

  • With streaming SDAM, the RTT monitor could share a connection from the pool, thus reducing the overhead of streaming for all users (not just FaaS).
  • With streaming SDAM, the task reading hello responses from the server will essentially pin a connection out of the pool, do we use a dedicated connection here instead?
  • Application pools are paused by default until SDAM discovers the state and marks the pool "ready". This will need to change since SDAM needs to checkout a connection before knowing the state. Perhaps we can add a CMAP api which adds an externally created connection to the pool?
  • When using an application socket, are CMAP or command events emitted?

Overall this could be a big improvement for FaaS envs and I think is in the realm of possibility (I have a quick POC here). My main concern is introducing complexity that might overlap with gRPC.

Comment by Bernie Hackett [ 03/Mar/23 ]

We would have to keep in mind the design for gRPC, in particular any support for automatically upgrading how we connect, from using mongoRPC for the initial connection to using gRPC once we know the deployment supports it.

Generated at Thu Feb 08 08:25:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.