[SERVER-38269] Failed to upgrade to MongoDB 4.0.4 on Windows Created: 27/Nov/18  Updated: 03/Dec/18  Resolved: 29/Nov/18

Status: Closed
Project: Core Server
Component/s: Upgrade/Downgrade
Affects Version/s: 4.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Itzhak Kagan Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File Rep3ServerOneArbiter.cfg     File Rep3ServerOneMember1.cfg     File Rep3ServerTwoMember2.cfg     File ServerOneArbiter.cfg     File ServerOneMember1.cfg     File ServerOneMember2.cfg     File ServerTwoArbiter.cfg     File ServerTwoMember1.cfg     File ServerTwoMember2.cfg     Text File rs-status-result.txt    
Operating System: ALL
Participants:

 Description   

Base line:

There are two servers (ServerOne, ServerTwo) with the following configuration:
OS: Windows server 2012R2
RAM: 3GB
Installed: latest version of vc_redist.x64.exe (relates to visual studio 2017)
MongoDB version: 3.4.6
FCV: 3.4
Engine: wiredTiger
SSL is used: (mode: requireSSL)
security: (clusterAuthMode: x509)
All replica sets are Windows services

Each server has a replica set that comprise two data members + arbiter (Rep1 for ServerOne and Rep2 for ServerTwo)
A third replica set (Rep3) exists. It consist of a data member + arbiter on ServerOne and a data member on ServerTwo.

Schema:

ServerOne win services:
1. Rep1 Data Member 1
2. Rep1 Data Member 2
3. Rep1 Arbiter
4. Rep3 Data Member
5. Rep3 Arbiter

ServerTwo win services:
1. Rep2 Data Member 1
2. Rep2 Data Member 2
3. Rep2 Arbiter
4. Rep3 Data Member

Goal:
We decided to upgrade all replica sets to version: 4.0.4.

Steps:

Step one: We upgraded all replica sets to version 3.6.8 and set the FCV to 3.6
That was finished successfully.

rs-status-result.txt
Step Two: We upgraded Rep1 and Rep2 to version 4.0.4 and set the FCV to 4.0
That also finished successfully.
Then we started to upgrade Rep3. The binaries for all members were replaced to version 4.0.4 and the services were restarted successfully.
When I entered the mongo shell on Rep3 on ServerOne (Primary) and executed the command rs.status() an error was displayed for the other member (located on ServerTwo):
"lastHeartbeatMessage" : "Error connecting to ServerTwo:27011 (10.36.151.137:27011) :: caused by :: The Local Security Authority cannot be contacted"

Attached files:
Rep1 on ServerOne:
ServerOneMember1.cfg
ServerOneMember2.cfg
ServerOneArbiter.cfg

Rep2 on ServerTwo:
ServerTwoMember1.cfg
ServerTwoMember2.cfg
ServerTwoArbiter.cfg

Rep3 on ServerOne + ServerTwo:
Rep3ServerOneMember1.cfg
Rep3ServerTwoMember2.cfg
Rep3ServerOneArbiter.cfg

Result of rs.status() on replica set three:
rs-status-result.txt

 

Thanks,

Itzik



 Comments   
Comment by Danny Hatcher (Inactive) [ 03/Dec/18 ]

Hello Itzik,

I'm glad to hear that you were able to resolve your issue. Please note that per our documentation, if your clusterFile is identical to your PEMKeyFile you can omit the clusterFile setting as x509 will default to using the PEMKeyFile.

Thank you,

Danny

Comment by Itzhak Kagan [ 03/Dec/18 ]

OK, I understood my mistake.

I thought  that the "clusterFile" on all config files should be the same, and it happen not to be so!

When I change the name of the "clusterFile" to be the same as the name of the "PEMKeyFile" it start working.

 

Thanks,

Itzik

Comment by Itzhak Kagan [ 30/Nov/18 ]

Refer to: https://docs.mongodb.com/manual/tutorial/configure-x509-member-authentication/#x509-internal-authentication

security:
clusterAuthMode: x509
net:
ssl:
mode: requireSSL
PEMKeyFile: <path to TLS/SSL certificate and key PEM file>
CAFile: <path to root CA PEM file>
clusterFile: <path to x.509 membership certificate and key PEM file>
bindIp: localhost,<hostname(s)|ip address(es)>

 

The configuration of ServerTwo also uses a PEMKeyFile and a clusterFile respectively.

That is why on a replica set that comprises tow computers I have different .pem files.
You can see that the clusterFile is the same for the entire replica set.
When defining a replica set only on one computer we also had no problems.

Only when the replica set spans over two or more computers we have encountered that error.

I will appreciate if you can test the case where the replica set comprises of at least two computers.

Thanks,

Itzik 

Comment by Danny Hatcher (Inactive) [ 30/Nov/18 ]

Hello Itzhak,

I have not been able to reproduce your issue locally and there are no known bugs in regards to x509 support on 4.0.

I do notice that you have two different PEM keyfiles listed in the Rep3Server2 config file while all the other servers have only one keyfile listed. Perhaps that is your issue?

Comment by Itzhak Kagan [ 30/Nov/18 ]

Hi Daniel

This Windows error does not occur when  MongoDB 3.4.6 and 3.6.8 versions are running.
Did you try to simulate the process? 
How can you explain that?

I'm not saying that this is a MongoDB error, but maybe some change in the 4.0.x version is dealing with some X509 issue that the earlier versions did not.

Maybe you can give me a hint of that?

Thanks,

Itzik

Comment by Danny Hatcher (Inactive) [ 29/Nov/18 ]

Hello Itzhak,

"lastHeartbeatMessage" : "Error connecting to ServerTwo:27011 (10.36.151.137:27011) :: caused by :: The Local Security Authority cannot be contacted"

This error message is triggered by Windows and not MongoDB. Unfortunately, I am not sure what is causing it. Please follow up with your Windows admins as they would be the best people to troubleshoot this Windows issue.

Thank you,

Danny

Generated at Thu Feb 08 04:48:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.