[SERVER-8093] server can't find self in balance_repl.js Created: 07/Jan/13  Updated: 11/Jul/16  Resolved: 14/Jan/13

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.4.0-rc0

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Randall Hunt
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File balance_repl.txt    
Operating System: ALL
Participants:

 Description   

http://buildlogs.mongodb.org/Nightly%20Linux%20RHEL%2064-bit/builds/355/test/slow%20nightly/balance_repl.js



 Comments   
Comment by Randall Hunt [ 09/Jan/13 ]

fixed the hosts file – it's generated when we spin up the machines (via chef) and doesn't get updated. I took this machine down to increase the size of data volume and it got a new internal IP.

Comment by Randolph Tan [ 07/Jan/13 ]

Test is failing because the mongod cannot find itself from the config docs when performing replica set initialization. It uses getifaddrs then getnameinfo to get known addresses for self and getaddrinfo then getnameinfo to get the ip addresses for the host being compared. And during the test run this is what we get from comparing self to bs-rhel-57-64-2:

 m31100| Mon Jan  7 11:21:57.102 [conn1] REN: port: 31100, self: 31100
 m31100| Mon Jan  7 11:21:57.103 [conn1] REN: linux comp: mine: 127.0.0.1, theirs: 10.76.211.188
 m31100| Mon Jan  7 11:21:57.103 [conn1] REN: linux comp: mine: 10.12.15.100, theirs: 10.76.211.188
 m31100| Mon Jan  7 11:21:57.103 [conn1] REN: linux comp: mine: ::1, theirs: 10.76.211.188
 m31100| Mon Jan  7 11:21:57.103 [conn1] REN: linux comp: mine: fe80::1031:3dff:fe29:2096%eth0, theirs: 10.76.211.188

Here's the output for /sbin/ifconfig:

10.76.211.188 bs-rhel-57-64-2.10gen.cc bs-rhel-57-64-2
 
eth0      Link encap:Ethernet  HWaddr 12:31:3D:29:20:96  
          inet addr:10.12.15.100  Bcast:10.12.15.255  Mask:255.255.254.0
          inet6 addr: fe80::1031:3dff:fe29:2096/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:105713 errors:0 dropped:0 overruns:0 frame:0
          TX packets:61370 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:116679956 (111.2 MiB)  TX bytes:7657854 (7.3 MiB)
 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3118424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3118424 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1340873536 (1.2 GiB)  TX bytes:1340873536 (1.2 GiB)

hostname:

bs-rhel-57-64-2

ping bs-rhel-57-64-2

PING bs-rhel-57-64-2.10gen.cc (10.76.211.188) 56(84) bytes of data.
 
--- bs-rhel-57-64-2.10gen.cc ping statistics ---
377 packets transmitted, 0 received, 100% packet loss, time 376029ms

cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.76.211.188 bs-rhel-57-64-2.10gen.cc bs-rhel-57-64-2

So it appears that someone patched the etc/hosts to include the this entry (and this entry might be valid in the past). If I remove this line, mongod will not be able to resolve bs-rhel-57-64-2. So I think we need to setup some dynamic name lookup service to resolve this issue.

Generated at Thu Feb 08 03:16:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.