Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1842

Drivers should retry authentication errors when connection handshake fails

    • To Do
    • Drivers should retry authentication errors when connection handshake fails
    • Needed

      Summary

      We've had a customer with one mongos that couldn't reach the LDAP server (due to a transient network issue) and so failed to authenticate new connections. The other mongos was fine. Can we consider the handshake as failed when external authentication is not possible.

      In our repro, blocking the ports to the LDAP server gave an error like:

      Caused by: com.mongodb.MongoCommandException: Command failed with error 18 (AuthenticationFailed): 'Authentication failed.' on server 192.168.1.122:27017. The full response is {"ok": 0.0, "errmsg": "Authentication failed.", "code": 18, "codeName": "AuthenticationFailed", "operationTime": {"$timestamp": {"t": 1610717564, "i": 2}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1610717564, "i": 2}}, "signature": {"hash": {"$binary": "VlqG0NMZ2vycHdc1jR1u6Zvika4=", "$type": "00"}, "keyId": {"$numberLong": "6917665203874168863"}}}}
      

      If it could blacklist the failing mongos for X seconds then retry the op via a healthy mongos we'd avoid this specific use case. 

      Motivation

      Who is the affected end user?

      Who are the stakeholders?

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?

      How likely is it that this problem or use case will occur?

      Main path? Edge case?

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?

      Is this ticket only for tests?

      Is this ticket have any functional impact, or is it just test improvements?

      Cast of Characters

      Engineering Lead:
      Document Author:
      POCers:
      Product Owner:
      Program Manager:
      Stakeholders:

      Channels & Docs

      Slack Channel

      [Scope Document|some.url]

      [Technical Design Document|some.url]

            Assignee:
            Unassigned Unassigned
            Reporter:
            rachelle.palmer@mongodb.com Rachelle Palmer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: