Investigate and implement non-retryable error handling

XMLWordPrintableJSON

    • Type: Story
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Environment:
      OS:
      node.js / npm versions:
      Additional info:
    • None
    • None
    • Developer Tools, Compass

      Use Case

      As a Data Explorer Engineer
      I want the connection attempt to fail fast when encountering non-retryable errors during the initial connection process
      So that I do not spend resources on automatic reconnect attempts by the client

      User Experience

      • When a connection encounters a non-retryable error (like "Cluster is not in a valid state" with close code 1008) during the initial connection attempt, the connection should fail immediately and close the MongoClient to prevent the driver's continuous monitoring and retry attempts.

      Risks/Unknowns

      • The non-retryable error handling added in compass-web may have stopped working or never fully covered the initial connection phase
      • The heartbeat failed listener isn't set up until after successful connection, so errors during initial connection aren't caught
      • Need to ensure we don't break other connection scenarios while implementing fail-fast behavior
      • We are going to hard code a list of codes in the client that needs to remain in sync with server implementation
      • Performance impact should be negligible

      Acceptance Criteria

      Implementation Requirements

      • Add error handling for non-retryable errors (close code 1008 and others) during the initial connection process, close the MongoClient
        • Follow patterns from devtools shared connect package for handling fail-fast during connection process

      Testing Requirements

      • Add an int/unit mocking test to devtools connect that handles the error we expect to be thrown from the tls polyfill in the scenarios we're trying to handle
      • Add new synthetic test that covers the e2e error toast behavior we expect from the improvement. We should expect to see the error much sooner than 30seconds (serverSelectionTimeoutMS)

      References

            Assignee:
            Neal Beeken
            Reporter:
            Simon Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: