[SERVER-3145] Replica Sets + Arbiter + bad hostname = wrong error message Created: 25/May/11  Updated: 09/Apr/15  Resolved: 09/Apr/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Gaetan Voyer-Perrault Assignee: Matt Dannenberg
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: RPL 1 04/03/15, RPL 2 04/24/15
Participants:

 Description   

To repro:

  • Start two replica set nodes
  • do rs.add("arbiter-west:27017", true) where "arbiter-west" does not exist

You get the following error message:

{
"assertion" : "need most members up to reconfigure, not ok : arbiter-west",
"assertionCode" : 13144,
"errmsg" : "db assertion failure",
"ok" : 0
}

However, the logs look like the following:
Wed May 25 22:21:41 [conn2] replSet replSetReconfig config object parses ok, 3 members specified
Wed May 25 22:21:41 [conn2] warning: getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] warning: getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] warning: getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] getaddrinfo("arbiter-west") failed: Name or service not known
Wed May 25 22:21:41 [conn2] replSet cmufcc requestHeartbeat arbiter-west:27017 : 9001 sock et exception [6]
Wed May 25 22:21:41 [conn2] replSet replSetReconfig exception: need most members up to rec onfigure, not ok : arbiter-west:27017

Problem
======
Message in shell does not really point to source of error. The error is a failed name resolution, but the exception in the shell is a configuration problem.

Expected Output
======
One of:
1. nested error messages
2. first error message ("name resolution failed" or "couldn't connect")



 Comments   
Comment by Matt Dannenberg [ 08/Apr/15 ]

Now the error message looks like:

2015-04-08T13:13:03.799-0400 I NETWORK  [ReplExecNetThread-0] getaddrinfo("foo") failed: Name or service not known
2015-04-08T13:13:03.800-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to foo:27017; Location18915 Failed attempt to connect to foo:27017; couldn't initialize connection to host foo, address is invalid
2015-04-08T13:13:03.802-0400 I NETWORK  [ReplExecNetThread-1] getaddrinfo("foo") failed: Name or service not known
2015-04-08T13:13:03.803-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to foo:27017; Location18915 Failed attempt to connect to foo:27017; couldn't initialize connection to host foo, address is invalid
2015-04-08T13:13:03.805-0400 I NETWORK  [ReplExecNetThread-2] getaddrinfo("foo") failed: Name or service not known
2015-04-08T13:13:03.805-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to foo:27017; Location18915 Failed attempt to connect to foo:27017; couldn't initialize connection to host foo, address is invalid

and the errmsg looks like:

{
        "ok" : 0,
        "errmsg" : "Quorum check failed because not enough voting nodes responded; required 4 but only the following 3 voting nodes responded: dannenstation.local:27017, dannenstation.local:27018, dannenstation.local:27019; the following nodes did not respond affirmatively: foo:27017 failed with Failed attempt to connect to foo:27017; couldn't initialize connection to host foo, address is invalid, asdffoo:27017 failed with Failed attempt to connect to asdffoo:27017; couldn't initialize connection to host asdffoo, address is invalid, asdfasdfafoo:27017 failed with Failed attempt to connect to asdfasdfafoo:27017; couldn't initialize connection to host asdfasdfafoo, address is invalid",
        "code" : 74
}

Generated at Thu Feb 08 03:02:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.