[CDRIVER-2174] _mongoc_cluster_check_interval() should invalidate nodes after detecting a closed socket Created: 02/Jun/17 Updated: 28/Oct/23 Resolved: 18/Jun/17 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | 1.6.3 |
| Fix Version/s: | 1.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jeremy Mikola | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Quoting the SDAM spec on Network error when reading or writing:
This doesn't jive with the implementation of _mongoc_cluster_check_interval(). Before the logic for socketCheckIntervalMS (i.e. isMaster for sockets last used more than five seconds ago by default), there is a basic check for a closed socket if the socket was last used more than a second ago (CHECK_CLOSED_DURATION_MSEC is defined as 1000):
I would expect mongoc_topology_invalidate_server() to be called here, as it often is in other places following mongoc_cluster_disconnect_node() on socket errors. I attempted to test this using the following PHP script:
If I start this script with a running MongoDB server on localhost, and restart it during the test, we get the following output (abridged to the relevant iterations around the restart):
Notice that two exceptions are thrown before we can recover from the socket error. If it's not immediately obvious, recovery occurs during the last iteration, which has a higher elapsed time. Also note that server selection succeeded immediately following the first "Stream is closed" error because the node was never invalidated. If we revise the original script to use a smaller usleep() interval to ensure we don't surpass CHECK_CLOSED_DURATION_MSEC, we only encounter one exception:
I noted that this CHECK_CLOSED_DURATION_MSEC logic is specific to the single-threaded SDAM implementation, but I was unable to correlate it with any section of the SDAM spec. Do you have a reference for it? That said, I believe we can solve this behavior by adding a mongoc_topology_invalidate_server() call after bson_set_error() and before return false in the mongoc_stream_check_closed() condition cited earlier. |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 18/Jun/17 ] |
|
Closed along with https://github.com/mongodb/mongo-c-driver/commit/9f9934dcd9d7de3e7f8db315145106df44edee23 |
| Comment by A. Jesse Jiryu Davis [ 07/Jun/17 ] |
|
Same in mongoc_cluster_run_command_internal. |