[SERVER-16818] Add socket timeout to isSelf replication check Created: 13/Jan/15 Updated: 23/Jan/15 Resolved: 15/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.6, 2.8.0-rc4 |
| Fix Version/s: | 3.0.0-rc6 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Joanna Cheng | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: |
|
||||||||
| Participants: | |||||||||
| Description |
|
When a mongod starts with --replSet and finds a config in local.system.replset, it will try to establish connections to the other replica set members. It seems that these initial connection attempts are not timed out, which means there is a possibility we might be hung forever waiting for a response from a down replica set member. By contrast, when an existing up replset member discovers a new replica set member (via rs.add) but the new member is actually uncontactable, the existing member will timeout the connection attempt. This ticket is to request that the initial connection attempts are timed out in the same way. In the repo given, prior to restarting the mongod, this node is in SECONDARY. It should be able to resume becoming SECONDARY after being restarted. Note: Adding a third node fixes this problem, it seems we only need a majority of members contacted for the config load to succeed. |
| Comments |
| Comment by Githook User [ 15/Jan/15 ] | |||||||||||||||||||||||||||||
|
Author: {u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}Message: | |||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 13/Jan/15 ] | |||||||||||||||||||||||||||||
|
This "repro" seems to be for the shell, not the server behavior. The shell does not have a timeout and it is expected to wait for the system to error or return data for the connection and reads. If that is what you want changed/improved then please open a new issue for the shell and remove that stuff from this issue. | |||||||||||||||||||||||||||||
| Comment by Joanna Cheng [ 13/Jan/15 ] | |||||||||||||||||||||||||||||
|
Not reproducible in 2.4.12; the node comes back as SECONDARY In 2.8.0-rc4 my mongo shell just hangs when trying to connect to the restarted node
Verbose logs show we're getting stuck on isMaster
|