[CDRIVER-3116] Do not disconnect sockets in mongoc_client_reset() Created: 07/May/19 Updated: 28/Oct/23 Resolved: 08/Nov/19 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | docs, libmongoc, tls |
| Affects Version/s: | 1.14.0 |
| Fix Version/s: | 1.16.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jeremy Mikola | Assignee: | Jeremy Mikola |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
The original summary for this issue was "Calling mongoc_client_reset() in child may affect SSL sockets in parent". This issue can be addressed by removing the mongoc_cluster_disconnect call in mongoc_client_reset. While integrating
This test is implemented in bug1274-001.phpt. It fails due to the parent's connection being closed sometime between forking and iterating the cursor (i.e. executing a getMore). In this first log, I record the script output (above steps) and also include a relevant portion of the server log that shows the connection terminating when the child exits:
In this second log, I have the child sleep five seconds before exiting and disabled the waitpid() in the parent so that iteration happens before the child exits:
In both cases, it looks like resetting an SSL connection from one process affects the other. I'm not sure if this is expected due to SSL interfering with being able to have two client sockets connected to the same server socket (shared file descriptor through the fork). Perhaps SSL is dropping the entire connection once it observes the child hang up? |
| Comments |
| Comment by Githook User [ 08/Nov/19 ] |
|
Author: {'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}Message: |
| Comment by Githook User [ 08/Nov/19 ] |
|
Author: {'name': 'Jeremy Mikola', 'username': 'jmikola', 'email': 'jmikola@gmail.com'}Message: |
| Comment by Jeremy Mikola [ 22/Oct/19 ] |
|
I tested this using the manager-reset-003.phpt test in mongodb/mongo-php-driver#1027 for The test connects to a server using TLS, inserts three documents, and executes a find with batchSize=2 to obtain a cursor on the result set. It then forks and has the parent process wait on the child, which simply calls mongo_client_reset() and terminates. The parent then continues iteration of its cursor and asserts that a socket exception occurs on the subsequent getMore. I modified mongoc_client_reset to no longer call mongoc_cluster_disconnect and observed that the test now fails because the parent's getMore succeeds. This confirms our hypothesis that we can guarantee consistent behavior for TLS and non-TLS servers by simply omitting the explicit disconnect. Therefore, I propose the following:
I've shared additional proposals for PHP in this comment in |
| Comment by Samantha Ritter (Inactive) [ 25/Sep/19 ] |
That sounds terrible. Yes, unfortunately it does seem like it's out of our control. The best we can do is probably stress in the documentation that, when using SSL, applications should close their connections before forking. |
| Comment by Jeremy Mikola [ 24/Sep/19 ] |
I don't think this is possible. The entire reason we came up with the design for mongoc_client_reset() was because our drivers don't know when the application will fork. The solution in PHP was going to add PID check logic in each driver operation to ensure we reset before our next application-invoked operation. That said, I agree that this issue would be better repurposed as a task to update the documentation for mongoc_client_reset() since this appears to be outside of our control. |
| Comment by Samantha Ritter (Inactive) [ 24/Sep/19 ] |
|
Right, but even though the child and parent aren't intended to both use the socket, they do still "share" it by virtue of the fork happening. The OpenSSL documentation around this behavior is not forthcoming. But, from what I gather from threads like this and this, OpenSSL caches session state with its connections, and when you fork, this information will also get duplicated to the child. So, even if the child process does not intend to use the socket, closing it messes with this shared state. Perhaps the issue comes from the state being sent to the receiving server. Without SSL, the child would close their socket and it would do nothing; maybe with SSL, the child causes the state of the shared SSL connection to change on the server, so that subsequent messages sent by the parent fail. That's my best guess at the moment. It seems like the best thing to do is to wait until after forking to do the SSL handshake, which means the parent cannot have continuous connections across forking. I don't think we could alter mongoc_client_reset() to behave correctly here, without leaving dangling sockets around.
Is it a reasonable workaround for you to close connections in the parent before forking? We should at the very least update our documentation around mongoc_client_reset() to mention how it fails with SSL. |
| Comment by Jeremy Mikola [ 23/Sep/19 ] |
|
samantha.ritter: I was most certainly using OpenSSL when I reported this issue, as that's the default option for Linux (and my build script doesn't customize it). To clarify, PHP is not attempting to share connections across child and parent processes. This is related to the post-fork reset functionality, where we want child processes to abandon sockets that were opened by a parent process. I think the unexpected behavior was that when SSL is used, resetting the client affects sockets from one process affects the other. The original test presented a case where a parent loses its connection because the child reset after being forked. The intention of opening this issue was to confirm whether something about SSL means we can't rely on the reset behavior from |
| Comment by Samantha Ritter (Inactive) [ 23/Sep/19 ] |
|
jmikola why does the PHP driver needs to share SSL connections between a child and parent process? From some research it looks like this is impossible to do with some SSL implementations, including OpenSSL. Can you share what SSL implementation you were running with? |
| Comment by Jeremy Mikola [ 23/Aug/19 ] |
|
Moving to 1.16 since we missed the boat for 1.15. |