Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1968

Single-threaded SDAM should construct a handshake command when reconnecting after a monitoring or application error/timeout

    • Type: Icon: Spec Change Spec Change
    • Resolution: Won't Do
    • Priority: Icon: Unknown Unknown
    • None
    • Component/s: Load Balancer, SDAM, Serverless
    • Labels:
      None
    • Not Needed

      Summary

      While investigating CLOUDP-104364, it was mentioned that the Atlas serverless proxy only outputs serviceId for the initial handshake, which is determined by the existence of a client field in the hello command.

      This uncovered a subtle bug in libmongoc (CDRIVER-4207). Errors and timeouts encountered during monitoring would typically result in libmongoc constructing a handshake command when reconnecting; however, errors/timeouts encountered during application usage do not. In single-threaded SDAM, the monitoring and application sockets are one and the same.

      Some suggestions for this ticket:

      • If not already noted in the SDAM spec, single-threaded implementations consider monitoring and application errors the same for purposes of sending a handshake hello during reconnection.
      • Consider noting the Atlas serverless proxy behavior in some drivers specification. I realize we don't have a serverless spec (just Serverless Testing), but perhaps this warrants a note in either the SDAM or Handshake specs.

      Motivation

      Who is the affected end user?

      Single-threaded SDAM implementations.

      How does this affect the end user?

      Users connected to Atlas Serverless that encounter an application error/timeout might get blocked by client-side load balancer errors due to a missing serviceId in subsequent hello responses.

      How likely is it that this problem or use case will occur?

      This may occur after any application error/timeout in a single-threaded driver connected to Atlas Serverless.

      If the problem does occur, what are the consequences and how severe are they?

      Application will likely be blocked for the lifetime of the MongoClient.

      Is this issue urgent?

      The spec change itself is not urgent and does not block libmongoc from fixing its bug.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No, but it does affect serverless testing since our spec tests trigger socket errors via fail points (see: PHPLIB-717).

            Assignee:
            Unassigned Unassigned
            Reporter:
            jmikola@mongodb.com Jeremy Mikola
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: