[SERVER-17102] primary fails to rejoin set on restart Created: 28/Jan/15 Updated: 29/Jan/15 Resolved: 28/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.0-rc6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Adam Midvidy | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
rhel55 32-bit |
||
| Attachments: |
|
|||
| Backwards Compatibility: | Fully Compatible | |||
| Operating System: | Linux | |||
| Steps To Reproduce: | start 3 node replSet, 2 mongods, one arbiter. one mongod has priority 99, other has priority 1.1 start replicaSet, high-priority node becomes primary as expected. Restart primary after doing a few ops. Primary fails to join set with error:
|
|||
| Participants: |
| Description |
|
found in cxx driver test suite on a RHEL 5.5 32bit host. server git version: ac9ee2fb80f2afc2737a0d9f346cff8117a82af2 |
| Comments |
| Comment by Adam Midvidy [ 28/Jan/15 ] | |
|
Scott, having mongo-orchestration set bindIp=127.0.0.1 seems to resolve the issue on our side. Do you want logs for a successful run? | |
| Comment by Scott Hernandez (Inactive) [ 28/Jan/15 ] | |
|
When it is stopped, are there any connections open on port 1053? If you start it with bindIp="127.0.0.1" is it fine? Can you post the logs for those runs as well, thanks. | |
| Comment by Adam Midvidy [ 28/Jan/15 ] | |
|
I stopped the process again using 'kill' and restarted it with the same config. It still transitioned to the "REMOVED" state. | |
| Comment by Scott Hernandez (Inactive) [ 28/Jan/15 ] | |
|
Also, you should use either 127.0.0.1 or localhost in both the bindIp and replica set configs since they might lead to hard to detect errors otherwise. | |
| Comment by Scott Hernandez (Inactive) [ 28/Jan/15 ] | |
|
The line above what you included shows that it could not connect to itself and couldn't verify the config that it was in the replica set, so it moved to REMOVED state as is appropriate:
Are you sure the process stopped correctly before restarting? How was it stopped? If you restart it again, does it work? | |
| Comment by Adam Midvidy [ 28/Jan/15 ] | |
|
added logs |