[SERVER-33441] ReplicaSet Issue on MongoDB Upgrade from 3.4.5 to 3.6.2 Created: 22/Feb/18  Updated: 27/Oct/23  Resolved: 05/Mar/18

Status: Closed
Project: Core Server
Component/s: Admin, Replication
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Manisha Pande Assignee: Dmitry Agranat
Resolution: Works as Designed Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

AWS


Attachments: File rs.status()     File vcp1-master-0-diagnostic.data.tar.gz     Text File vcp1-master-0.txt     File vcp1-master-1-diagnostic.data.tar.gz     Text File vcp1-master-1-mongod.log     Text File vcp1-master-1.txt     File vcp1-master-2-diagnostic.data.tar.gz     Text File vcp1-master-2.txt    
Participants:

 Description   

We upgraded mongoDB 3.4.5 to 3.6.2 in centos, where replicaset are not syncing and connecting to each other. To resolve that we tried to resync replica configuration, but its not working.

We have three mongo DB nodes. One Primary and two secondaries.

Command follow to upgrade:
1. Created mongodb-org-3.6.repo file in /etc/yum.repos.d folder

[mongodb-org-3.6]
baseurl = http://xyz/mongodb-3.6
enabled = 1
gpgcheck = 0
gpgkey = http://xyz/mongodb-3.6/repodata/repomd.xml.asc
name = MongoDB Repository 3.6

2. Connected to secondary mongoDB node run admin command to check version

rs0:SECONDARY> db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )
{ "featureCompatibilityVersion" : "3.4", "ok" : 1 }

3. Stop MongoDB in secondary Node

[centos@vcp1-master-1]$ sudo systemctl stop mongod

4. Install Mongo DB

[centos@vcp1-master-1]$ sudo yum -y install mongodb-org

5. Started MongoDB

[centos@vcp1-master-1]$ sudo systemctl start mongod

6. Connected to Primary to check status of secondary run rs.status()
Attached rs.status() logs

Same issue with other secondary as well . Although they are up and running but replicas showing Connection Refused.
When we are connecting to any secondary node and run rs.status(). I shows that node replica is connected and others connection refused (same with other secondary)

As soon as, upgrade applied in both Secondary replicas. they automatcically they are try to become primary.

Logs:

2018-02-21T15:46:45.523+0000 I REPL     [replexec-67] VoteRequester(term 413) received a yes vote from vcp1-master-0.asml.tibco.aws:27040; response message: { term: 413, voteGranted: true, reason: "", ok: 1.0 }
2018-02-21T15:46:45.523+0000 I REPL     [replexec-67] election succeeded, assuming primary role in term 413
2018-02-21T15:46:45.523+0000 I REPL     [replexec-67] transition to PRIMARY from SECONDARY
2018-02-21T15:46:45.523+0000 I REPL     [replexec-67] Entering primary catch-up mode.
2018-02-21T15:46:45.523+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to vcp1-master-1.asml.tibco.aws:27040
2018-02-21T15:46:45.524+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to vcp1-master-1.asml.tibco.aws:27040 - HostUnreachable: Connection refused
2018-02-21T15:46:45.524+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to vcp1-master-1.asml.tibco.aws:27040 due to failed operation on a connection
2018-02-21T15:46:45.524+0000 I REPL_HB  [replexec-66] Error in heartbeat (requestId: 18210) to vcp1-master-1.asml.tibco.aws:27040, response status: HostUnreachable: Connection refused
2018-02-21T15:46:45.524+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to vcp1-master-1.asml.tibco.aws:27040
2018-02-21T15:46:45.525+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to vcp1-master-1.asml.tibco.aws:27040 - HostUnreachable: Connection refused
2018-02-21T15:46:45.525+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to vcp1-master-1.asml.tibco.aws:27040 due to failed operation on a connection
2018-02-21T15:46:45.525+0000 I REPL_HB  [replexec-55] Error in heartbeat (requestId: 18213) to vcp1-master-1.asml.tibco.aws:27040, response status: HostUnreachable: Connection refused



 Comments   
Comment by Dmitry Agranat [ 05/Mar/18 ]

Hi mpande,

Glad to hear that using the net.bindIp configuration file setting resolved the issue.

I was checking custom rules for the same. could you please provide me an example of custom rules.

Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Thanks,
Dima

Comment by Manisha Pande [ 05/Mar/18 ]

Thanks Dima.

I have set net.bindIp to 0.0.0.0 in mongod.conf file in all mongodb nodes and its working, but I want to restrict to all mongo connections to nodes.

I was checking custom rules for the same. could you please provide me an example of custom rules.

Comment by Dmitry Agranat [ 28/Feb/18 ]

Hi mpande,

Thank you for providing the requested information. From the vcp1-master-1-mongod.log I can see:

2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten]
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server.
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
2018-02-27T15:12:57.495+0000 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.

Based on our 3.6 upgrade documentation procedure, you will need to use the net.bindIp configuration file setting or the --bind_ip command-line option to specify a list of ip addresses.

Please follow these steps and let us know if you still encounter this issue.

Thanks,
Dima

Comment by Manisha Pande [ 27/Feb/18 ]

Hi Dima,

I have attached requested logs.

I have provision the cluster in aws, install mongodb 3.4 version and upgraded only one mongodb node (vap1-master-1)
mongodb have three nodes:
vcp1-master-0 (PRIMARY)
vcp1-master-1(SECONDARY)
vcp1-master-2 (SECONDARY)

Regards,
Manisha Pande

Comment by Dmitry Agranat [ 25/Feb/18 ]

Hi mpande

Thank you for the report. To get some more insight into this issue, could you please provide the following:

  • Archive (tar or zip) the $dbPath/diagnostic.data directory from all members of the replica set
  • Compressed mongod logs from all members of the replica set covering the time of before and after the upgrade
  • The output of rs.conf() command

This should provide some information to help diagnose this.

Thanks,
Dima

Generated at Thu Feb 08 04:33:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.