Investigate and fix non-retryable error handling

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Environment:
      OS:
      node.js / npm versions:
      Additional info:
    • None
    • None
    • Developer Tools, Compass

      The non-retryable error handling we added for compass-web here: https://github.com/mongodb-js/compass/blob/a9ea81e89378d6205ae664f858a23c7135fe3a46/packages/compass-connections/src/stores/connections-store-redux.ts#L1410-L1419

      Seems like it might have stopped working.

      Look at this Sentry trace: https://mongodb-org.sentry.io/dashboards/trace/d240e2adba4a499bb67791de3fad6629/?dashboardId=142094&environment=prod&fov=17642.443549603224%2C60.02953439950943&node=span-3627092fbd124f13&project=4505240668733440&project=4509401391628288&source=dashboards&statsPeriod=3d×tamp=1756854234&widgetId=1150500

      The connection attempt is timing out after 30 seconds, yet we see that every websocket that gets opened is getting closed with "Cluster is not in a valid state" error (see screenshots).

      This maps to close code 1008 (Violated Policy), which we are also seeing in the trace (again, see screenshot)

      This seems to be one of the codes that we are handling in the compass code, and so it should cause the Data Service to get disconnected. But as we see, the driver keeps retrying and retrying, so there is something wrong.

      We don't have e2e tests for this code path, let's add them here.

      The connection in the stack trace never successfully connected, so our listener for the heartbeat failed isn't setup yet, so this error isn't caught and we don't disconnect. We should account for this error while we're fetching the instance information.

      Added the listener in https://github.com/mongodb-js/compass/pull/6598/files#diff-018a2f2c6c0aff913162a6568095ab43fb5d67249a66a9f30ae26c4ea61d3daaR1729 

            Assignee:
            Jack Weir
            Reporter:
            Simon Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: