[SERVER-40822] MongoDB Replica set is not joining after changing the containers in Docker. Created: 24/Apr/19  Updated: 30/Apr/19  Resolved: 30/Apr/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: harshavardhan Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Docker container


Operating System: Linux
Participants:

 Description   

2019-04-24T09:51:43.453-0500 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1556117443, 1) (top of oplog: { ts: Timestamp(1556117485, 1), t: 43 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2019-04-24T09:51:43.453-0500 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1556117443, 1)
2019-04-24T09:51:43.453-0500 I REPL [initandlisten] Replaying stored operations from { : Timestamp(1556117443, 1) } (exclusive) to { : Timestamp(1556117485, 1) } (inclusive).
2019-04-24T09:51:43.455-0500 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2019-04-24T09:51:43.455-0500 I NETWORK [initandlisten] waiting for connections on port 27017
2019-04-24T09:51:43.459-0500 W NETWORK [replexec-0] getaddrinfo("mongodb2") failed: Temporary failure in name resolution
2019-04-24T09:51:43.464-0500 W REPL [replexec-0] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 93501 for replica set ecf-replicas-set maps to this node" while validating { _id: "ecf-replicas-set", version: 93501, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "mongodb1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongodb2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "mongodb3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: "majority", wtimeout: 30000 }, replicaSetId: ObjectId('5bda38cf657acc2a8da906e1') } }
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] New replica set config in use: { _id: "ecf-replicas-set", version: 93501, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "mongodb1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongodb2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "mongodb3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: "majority", wtimeout: 30000 }, replicaSetId: ObjectId('5bda38cf657acc2a8da906e1') } }
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] This node is not a member of the config
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] transition to REMOVED from STARTUP
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] Starting replication storage threads
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] Starting replication fetcher thread
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] Starting replication applier thread
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] Starting replication reporter thread
2019-04-24T09:51:43.464-0500 I REPL [rsSync-0] Starting oplog application



 Comments   
Comment by Danny Hatcher (Inactive) [ 30/Apr/19 ]

That indicates to me that the container was then able to satisfy name resolution which allowed the node to be recognized as a member of the replica set.

The SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Comment by harshavardhan [ 29/Apr/19 ]

 No, we are not seeing any error message after running rs.reconfig().

Comment by Danny Hatcher (Inactive) [ 29/Apr/19 ]

The name resolution failure can be temporary; MongoDB will try to resolve to the hostname and it appears that it succeeded. Are you still seeing the warning messages in the logs?

W NETWORK [replexec-0] getaddrinfo("mongodb2") failed: Temporary failure in name resolution

Comment by harshavardhan [ 29/Apr/19 ]

If this is caused by DNS issue. why is it resolved after I ran rs.reconfig() without restarting the container? Can I get any help here?

 

Comment by Danny Hatcher (Inactive) [ 29/Apr/19 ]

This appears to be an issue with the DNS which causes MongoDB to not associate the hostname with a member in the replica set config:

2019-04-24T09:51:43.459-0500 W NETWORK [replexec-0] getaddrinfo("mongodb2") failed: Temporary failure in name resolution
...
2019-04-24T09:51:43.464-0500 I REPL [replexec-0] This node is not a member of the config

If you restart the container, does the "Temporary failure in name resolution" error happen again? If so, you need to investigate the deeper DNS issues which is outside the scope of MongoDB.

Generated at Thu Feb 08 04:56:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.