[SERVER-26702] Config Server connection refused Created: 19/Oct/16  Updated: 03/Aug/17  Resolved: 11/Jul/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Darshan Shah Assignee: Kelsey Schubert
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongodb_configdb.log.2016-10-19T19-19-55.gz    
Operating System: ALL
Participants:

 Description   

Of the 3 node CSRS, one node is consistently having connection refused problems as is seen in it's own log as well as other config servers and all mongos logs:

2016-10-19T09:01:10.957-0400 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 10.170.7.16:29102, reason: errno:111 Connection refused
2016-10-19T09:01:21.945-0400 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 10.170.7.16:29102, reason: errno:111 Connection refused
2016-10-19T09:01:32.933-0400 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 10.170.7.16:29102, reason: errno:111 Connection refused

This is the 3rd node of the CSRS and has dbpath pointing to a netapp volume. Note that the limits are bumped up pretty high and there is minimal load on the cluster so it should not have anything to do with lack of available sockets/file descriptors to open.

The sharded cluster is running MongoDb 3.2.5 with WiredTiger.



 Comments   
Comment by Kelsey Schubert [ 11/Jul/17 ]

Hi darshan.shah@interactivedata.com,

Sorry for the delay getting back to you. Unfortunately, we cannot reproduce this issue, and it's likely that the root cause of this behavior is outside of MongoDB.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Thomas

Comment by Darshan Shah [ 20/Oct/16 ]

Log file for the node reboot just after the problem occurred.

Comment by Darshan Shah [ 20/Oct/16 ]

This is the output from rs.status() as of now:

csReplSet:PRIMARY> rs.status()
{
        "set" : "csReplSet",
        "date" : ISODate("2016-10-20T13:29:32.378Z"),
        "myState" : 1,
        "term" : NumberLong(15),
        "configsvr" : true,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "mongoconfigserver1:29102",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 57896,
                        "optime" : {
                                "ts" : Timestamp(1476970172, 3),
                                "t" : NumberLong(15)
                        },
                        "optimeDate" : ISODate("2016-10-20T13:29:32Z"),
                        "electionTime" : Timestamp(1476912287, 1),
                        "electionDate" : ISODate("2016-10-19T21:24:47Z"),
                        "configVersion" : 3,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "mongoconfigserver2:29102",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 57890,
                        "optime" : {
                                "ts" : Timestamp(1476970171, 8),
                                "t" : NumberLong(15)
                        },
                        "optimeDate" : ISODate("2016-10-20T13:29:31Z"),
                        "lastHeartbeat" : ISODate("2016-10-20T13:29:32.094Z"),
                        "lastHeartbeatRecv" : ISODate("2016-10-20T13:29:31.364Z"),
                        "pingMs" : NumberLong(28),
                        "syncingTo" : "mongoconfigserver1:29102",
                        "configVersion" : 3
                },
                {
                        "_id" : 2,
                        "name" : "mongoconfigserver3:29102",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 57890,
                        "optime" : {
                                "ts" : Timestamp(1476970171, 4),
                                "t" : NumberLong(15)
                        },
                        "optimeDate" : ISODate("2016-10-20T13:29:31Z"),
                        "lastHeartbeat" : ISODate("2016-10-20T13:29:31.508Z"),
                        "lastHeartbeatRecv" : ISODate("2016-10-20T13:29:31.563Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "mongoconfigserver1:29102",
                        "configVersion" : 3
                }
        ],
        "ok" : 1
}
csReplSet:PRIMARY>

This issue is intermittent - Config server works just fine preceding and following the block of time when this issue occurs. I will attach a log file I have from a restart of the node just after this problem occurred yesterday.

Comment by Ramon Fernandez Marina [ 20/Oct/16 ]

My apologies, I should have also asked for the output of rs.status(), which should tell us more about the unreachability of one of its members – can you please send that as well?

One thing you can try is to reboot the node that's having problems and capture the log. I'm looking for useful startup warnings/errors that may tell us more. You can also try to resync the failing node.

Comment by Darshan Shah [ 20/Oct/16 ]

Here is the output from rs.conf():

csReplSet:PRIMARY> rs.conf()
{
        "_id" : "csReplSet",
        "version" : 3,
        "configsvr" : true,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "mongoconfigserver1:29102",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 1,
                        "host" : "mongoconfigserver2:29102",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 2,
                        "host" : "mongoconfigserver3:29102",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 
                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "getLastErrorModes" : {
 
                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("57f818631233dad8f8b60814")
        }
}
csReplSet:PRIMARY>

Unfortunately, the log rolled over so it's not available for the particular time frame - will keep monitoring and save it next time.
Please let me know if I can do anything else to help debug this.

Thanks,
Darshan.

Comment by Ramon Fernandez Marina [ 19/Oct/16 ]

Unfortunately there's not enough information in the log snippet you sent to determine if you've found a bug or if this is a configuration issue. Can you please upload the following?

  • Full logs for this config server
  • The output of rs.conf()

Thanks,
Ramón.

Generated at Thu Feb 08 04:12:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.