[SERVER-9110] checkAuth can cause step down on Authentication failure Created: 25/Mar/13  Updated: 10/Dec/14  Resolved: 28/May/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andre de Frere Assignee: Eric Milkie
Resolution: Cannot Reproduce Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-3715 re-adding member to replica set witho... Closed
Operating System: ALL
Participants:

 Description   

Manager::checkAuth in manager.cpp will cause a Primary to step down if there is an auth failure, and no nodes are marked as up. This is primarily during a reconfig so that the authentication problem would get attention from the user - however it has also been seen to bring down a otherwise functioning Primary in a replica set.

There doesn't seem to be a benefit to stepping down on auth errors, especially in the case were a node has been operational as Primary for some time.



 Comments   
Comment by Eric Milkie [ 28/May/13 ]

I was unable to find any failure case for version 2.4, where the reconnect and auth code was significantly rewritten.

Comment by Eric Milkie [ 08/May/13 ]

Due to SERVER-3715, we can't change the behavior of stepping down when there is an authentication failure (how would you know which server is authoritative when there is a disagreement over who has the correct credentials for the internal system user?).
There might be an issue with reconnecting a broken connection and restoring the proper credentials after a transient network failure. I am looking for more information on this specific failure case.

Comment by Eric Milkie [ 25/Mar/13 ]

I'd like to treat inaccessible secondaries the same as unreachable secondaries.

Generated at Thu Feb 08 03:19:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.