[SERVER-4094] better mongos handling of state where connections can be established but mongod unresponsive Created: 18/Oct/11 Updated: 06/Dec/22 Resolved: 31/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
It's possible to get into a hung state in mongos given a sharded cluster with a replica set shard. If the shard primary continues to allow connections but does not respond to other requests, failover will occur and a new primary will be elected normally, but sharded queries via mongos will block and not return. This failure mode has been observed on EC2. Can reproduce locally by : |
| Comments |
| Comment by Ratika Gandhi [ 31/May/19 ] |
|
TCP heartbeats should solve the problem of backhole-ing networking |
| Comment by Eric Milkie [ 17/Jan/12 ] |
|
Upon further discussion, it sounds like it would be better to deliver a SIGHUP signal to the thread blocked in the recv(), and set up a signal handler just for this area of code. |
| Comment by Eric Milkie [ 17/Jan/12 ] |
|
It looks like it would be okay to close the socket as a way of freeing up the thread blocked on a recv() of the dead socket. |
| Comment by Greg Studer [ 18/Oct/11 ] |
|
Reproduced in master |
| Comment by Greg Studer [ 18/Oct/11 ] |
|
To clarify, only tried with 2.0.0, not sure if other versions affected. |