[SERVER-40776] Upgrading from 3.4 to 3.6 breaks internal membership authentication Created: 23/Apr/19  Updated: 29/Jul/19  Resolved: 29/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.16, 3.6.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Cassio Mosqueira Assignee: Danny Hatcher (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File m1.log     Text File m2.log     Text File m4.log    
Operating System: ALL
Participants:

 Description   

I have a 3 member replica set running on Windows using SSL for internal membership authentication, which has been working well for a couple of years.

I never had issues upgrading before, but this time when I upgraded one of the secondaries to 3.6, the upgraded secondary became unable to authenticate the other members that are still on 3.4.

Here is the log message on the member that has been upgraded to 3.6 (m1.mydomain.com):

 

2019-04-21T00:12:19.123Z I ACCESS   [conn7] Failed to authenticate CN=m2.mydomain.com,OU=Dept1,O=MyDomain,ST=NY,C=US@$external from client 162.221.55.62:53006 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=m2.mydomain.com,OU=Dept1,O=MyDomain,ST=NY,C=US@$external
2019-04-21T00:12:19.173Z I ACCESS   [conn7] Unauthorized: not authorized on admin to execute command { replSetHeartbeat: "myReplicaSet", configVersion: 438347, from: "m2.mydomain.com:40000", fromId: 3, term: 644, $replData: 1, $db: "admin" }

 

 

In the members that were not upgraded, I saw several of this message:

2019-04-20T10:51:47.427Z I REPL     [ReplicationExecutor] Error in heartbeat request to m1.mydomain.com:40000; Unauthorized: not authorized on admin to execute command { replSetHeartbeat: "myReplicaSet", configVersion: 438343, from: "m4.mydomain.com:40000", fromId: 6, term: 640, $replData: 1, $db: "admin" }

My certificates look right and they are working on version 3.4:

CN=m4.mydomain.com,OU=Dept1,O=MyDomain,ST=NY,C=US
CN=m2.mydomain.com,OU=Dept1,O=MyDomain,ST=NY,C=US
CN=m1.mydomain.com,OU=Dept1,O=MyDomain,ST=NY,C=US

Here's the replica set config file.

 

storage:  
    dbPath: c:\mongossl\data
systemLog:  
    destination: file
    path: c:\mongossl\log\mongod.log
    logAppend: true
    timeStampFormat: iso8601-utc
replication:  
    replSetName: myReplicaSet
net:  
    port: 40000
    bindIpAll: true
    ssl:
        mode: preferSSL
        PEMKeyFile: c:\certs\m1.pem
        CAFile: c:\certs\ca.crt
        clusterFile: c:\certs\m1.pem
security:  
    authorization: disabled
    clusterAuthMode: x509

 

The issue doesn't happen if I add transitionToAuth to the security section of the config file. 

 

 



 Comments   
Comment by Danny Hatcher (Inactive) [ 29/Jul/19 ]

Closing due to lack of response.

Comment by Danny Hatcher (Inactive) [ 14/May/19 ]

I've been trying to reproduce this situation but without success. I'm able to upgrade from 3.4.19 to 3.6.12 using the same config settings as you without issues.

Would you be able to try this process in a new replica set? It can be empty; I just want to see if this is a problem specifically for that cluster or in your environment in general.

Comment by Cassio Mosqueira [ 09/May/19 ]

When I downgrade m1, it starts to work immediately. I just have to remove
the bindIpAll from the config file because 3.4 will not start with it.

The first time I tried to upgrade, I hadn't noticed the authentication
issue, and upgraded m2 too. That brought the replica set down. I can't try
that again easily because this is a production db.

On Thu, May 9, 2019, 11:33 AM Daniel Hatcher (Jira) <jira@mongodb.org>

Comment by Danny Hatcher (Inactive) [ 09/May/19 ]

If you immediately downgrade the m1 node from the non-working state, does it start working again without other intervention? If you complete the upgrade on another node, can they connect to m1 then? I'm trying to determine if this is an issue with purely the transition period or if internal auth is permanently broken.

Comment by Cassio Mosqueira [ 09/May/19 ]

I just tried to upgrade one node again (m1) from 3.4 to 3.6 and the same issue occurred. I have uploaded the log files for the 3 members during the upgrade process.

Comment by Cassio Mosqueira [ 06/May/19 ]

I didn't have time to try again. I will attempt to upgrade a node next week and upload the logs.

Comment by Danny Hatcher (Inactive) [ 06/May/19 ]

Are you still having this issue?

Comment by Danny Hatcher (Inactive) [ 24/Apr/19 ]

Hello Cassio,

Could you attach the logs from all members to this ticket? Ideally the logs would include the time the nodes were on 3.4 as well as when you upgraded.

Thanks,

Danny

Generated at Thu Feb 08 04:55:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.