[DRIVERS-2140] Clarify Auth Spec and Clean Up Error Section Created: 15/Apr/21  Updated: 31/Mar/22

Status: Backlog
Project: Drivers
Component/s: Authentication
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: Rachelle Palmer Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DRIVERS-746 Drivers should retry operations if co... Implementing
related to DRIVERS-1571 Direct read/write retries to another ... Implementing
Driver Changes: Needed

 Description   

Failure scenario:

  • A customer has multiple mongos
  • 1 mongos is in a failure state and cannot be connected to for authentication

Asks:

  • detect an unhealthy mongos and reroute operations to healthy sibling mongos (DRIVERS-1571)
  • retry authentication against unhealthy member (DRIVERS-1476)

So I reviewed https://github.com/mongodb/specifications/blob/master/source/auth/auth.rst#authentication-handshake as part of opening this ticket and our spec needs some other tweaks
1) there's a 'todo: errors" section which is not completely written (I guess?)
2) we probably shouldn't have dev@mongodb.com as an example since that is our CEOs name/email?
3) it might be nice to have a flowchart example of some kind since even after reading a lot of text I still couldn't decide what happens in the event of authentication failure, or if retries happen, or for how long those retries are going to happen.



 Comments   
Comment by Rachelle Palmer [ 16/Apr/21 ]

I think I agree that it's duplicative... I'm going to transform this into a spec ticket and strip out the duplicate parts. Still relevant that our spec is incomplete/needs edits.

Comment by Shane Harvey [ 15/Apr/21 ]

I agree this seems to be a dup of DRIVERS-1571. DRIVERS-746 is also related.

Per the SDAM spec here we mark the server as unknown during the auth failure, but it's not clear if that includes a mongos auth failure

Yes an auth failure from mongos is the same as an auth failure from mongod so the driver will mark the mongos unknown. Assuming that the mongos is still reporting itself as healthy, SDAM will rediscover the node automatically sometime soon after (anywhere from 0 seconds to 10 seconds later). Note that this was recently changed in DRIVERS-1476. Before DRIVERS-1476, the driver would not mark the server unknown from an auth error.

So after DRIVERS-1476 is implemented, drivers will select the unhealthy mongos less frequently. The state will flip-flop between healthy and unknown. This isn't perfect but it's better than the old behavior where the server would stay in the healthy state forever.

Generated at Thu Feb 08 08:24:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.