[SERVER-9908] On Linux, Socket::connect() timeout doesn't interrupt ConnectBG job Created: 12/Jun/13 Updated: 09/Oct/13 Resolved: 14/Jun/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | 2.2.4, 2.4.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Socket::connect() spawns a ConnectBG job that it interrupts after a 5-second timeout. However, the job doesn't get interrupted on Linux (and SO_RCVTIMEO/SO_SNDTIMEO don't have an effect on connect timeouts in Linux). Thus, the system default is used as the effective connect timeout (as a function of net.ipv4.tcp_syn_retries). This affects all intra-cluster connections made by mongod/mongos (and shell/tools) in failure modes where SYN packets destined to the remote server are silently dropped. Reproduce with:
|
| Comments |
| Comment by Kevin Pulo [ 09/Oct/13 ] |
|
Also, the portion of |
| Comment by J Rassi [ 14/Jun/13 ] |
|
This has been fixed in 2.5.0 by |
| Comment by J Rassi [ 14/Jun/13 ] |
|
He's saying that the 5-second timeout should get enforced, but it's not getting enforced. I rewrote the title/description, should clarify the bug. |
| Comment by Eric Milkie [ 12/Jun/13 ] |
|
For clarification, are you saying that we should do something to more strictly enforce the 5s timeout, because 30s timeouts are problematic? Or are you saying we should change the socket timeout to 30s? |