[CDRIVER-2172] Call "ping" on a socket that has been idle for socketCheckIntervalMS Created: 01/Jun/17 Updated: 28/Oct/23 Resolved: 16/Jun/17 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | None |
| Fix Version/s: | 1.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jeremy Mikola | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
Similar to Using the basic case of a standalone topology as an example, I created a basic PHP script that connects to localhost:27017 and issues a ping command and served this through a single-worker web server to isolate our libmongoc client persistence to all requests. When I restarted the mongod server between requests (within the span of two seconds), the socket is left in the CLOSE_WAIT state and the subsequent request encounters a "Stream is closed" error. socketCheckIntervalMS does not seem relevant here, as the _mongoc_cluster_check_interval() will return a "Stream is closed" error before deciding if socketCheckIntervalMS warrants an isMaster command. The original test used our default server selection (serverSelectionTryOnce=true). When I switched to serverSelectionTryOnce=false, I did not notice any change in behavior. Is that by design, since libmongoc has technically left the server selection loop at this point? If that is the case, I think it supports the idea of retrying connections in this case. I did notice that tuning heartbeatFrequencyMS lower to ensure a topology update between restarting mongod and the subsequent PHP request did avoid a "Stream is closed" error being reported to the user. Note: if this issue is worth implementing, we should probably also revise the SDAM spec accordingly. I can create that ticket if needed. |
| Comments |
| Comment by Githook User [ 16/Jun/17 ] | |||||||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Server Selection Spec update for single-threaded drivers. If a server is selected that has an existing connection that has been idle for | |||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 14/Jun/17 ] | |||||||||||||||||
|
We've updated the specs: https://github.com/mongodb/specifications/commit/4369d5667b663ab07b2928ad04fdec1b56b6c20f What the driver should do is re-check an idle socket right after server selection. If a server is selected that has an existing connection that's been idle for 5 seconds, the driver should attempt to call "ping", not ismaster, since we're not trying to update the topology. If "ping" succeeds, use the socket. If it failed, set the server's type to Unknown and re-enter server selection once, re-discovering the topology if needed (as SDAM already specifies), then proceed from there. | |||||||||||||||||
| Comment by Githook User [ 09/Jun/17 ] | |||||||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: If a server is selected and its socket has not been used in 5 seconds, | |||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Jun/17 ] | |||||||||||||||||
|
I'm reopening this. Depending on how the related Server Discovery And Monitoring Spec change is resolved, this may require more work. At the very least, the logic should move from mongoc_client_select_server into mongoc_topology_select_server so that all single-threaded code paths benefit from the change, not just one function. | |||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 06/Jun/17 ] | |||||||||||||||||
|
The PHP driver uses the C driver like:
The best solution I can think of to make the above logic more resilient is to update mongoc_client_select_server:
Additionally, there's logic in mongoc_cluster_sendv_to_server to check a 1-second-old socket to see if it's closed (which is not in any spec), and check a 5-second-old socket with an ismaster (a misinterpretation of SDAM). If either check fails then mongoc_cluster_sendv_to_server returns an error, which is useless. Remove those checks from mongoc_cluster_sendv_to_server. | |||||||||||||||||
| Comment by Jeremy Mikola [ 06/Jun/17 ] | |||||||||||||||||
This may be problematic if the connection was dropped as a result a replica set failover, where a primary has stepped down to become a secondary. In that case, libmongoc thinks it has selected a primary (based on the last topology state) and after reconnecting it may be a secondary. In this case, we don't want to return a socket for a secondary and should instead report an error. | |||||||||||||||||
| Comment by Jeremy Mikola [ 01/Jun/17 ] | |||||||||||||||||
|
Another observation: |