[CDRIVER-3116] Do not disconnect sockets in mongoc_client_reset() Created: 07/May/19  Updated: 28/Oct/23  Resolved: 08/Nov/19

Status: Closed
Project: C Driver
Component/s: docs, libmongoc, tls
Affects Version/s: 1.14.0
Fix Version/s: 1.16.0

Type: Bug Priority: Major - P3
Reporter: Jeremy Mikola Assignee: Jeremy Mikola
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by PHPC-1274 Reset libmongoc client after forking ... Closed
Related
related to CDRIVER-3438 Destroy exhaust cursor socket in mong... Closed
related to CDRIVER-3391 Document fork safety Closed
is related to CDRIVER-2857 Provide a reset method to be called o... Closed

 Description   

The original summary for this issue was "Calling mongoc_client_reset() in child may affect SSL sockets in parent". This issue can be addressed by removing the mongoc_cluster_disconnect call in mongoc_client_reset.


While integrating CDRIVER-2857 in PHPC-1274, I observed reproducible test failures when connected to a standalone server using SSL. The tests in question do the following:

  • Connect to a standalone using SSL
  • Insert three documents into a collection
  • Query the collection with batchSize=2. Query execution in PHP will entail creating a cursor with the first batch of results.
  • Fork the process, such that the child immediately exits
  • The parent will wait on the child to exit, and then proceed to iterate the cursor to completion

This test is implemented in bug1274-001.phpt. It fails due to the parent's connection being closed sometime between forking and iterating the cursor (i.e. executing a getMore).

In this first log, I record the script output (above steps) and also include a relevant portion of the server log that shows the connection terminating when the child exits:

find command specifies batchSize: 2
Child exits
Parent waited for child to exit
getMore command specifies batchSize: 2
 
Fatal error: Uncaught MongoDB\Driver\Exception\ConnectionTimeoutException: Failed to send "getMore" command with database "phongo": Failed to read 4 bytes: socket error or timeout in /home/jmikola/workspace/mongodb/phpc/tests/cursor/bug1274-001.php:59
 
2019-05-07T09:56:47.264-0400 I  NETWORK  [listener] connection accepted from 127.0.0.1:40904 #53 (1 connection now open)
2019-05-07T09:56:47.283-0400 I  NETWORK  [conn53] received client metadata from 127.0.0.1:40904 conn53: { driver: { name: "mongoc / ext-mongodb:PHP", version: "1.14.0 / 1.6.0alpha2-dev" }, os: { type: "Linux", name: "Ubuntu", version: "18.04", architecture: "x86_64" }, platform: "cfg=0xd15ea8e9 posix=200809 stdc=201112 CC=GCC 7.4.0 CFLAGS="-g  -O0" LDFLAGS="" / PHP 7.2.12" }
2019-05-07T09:56:47.284-0400 I  STORAGE  [conn53] createCollection: phongo.cursor_bug1274_001 with generated UUID: 6a9426ee-d4d3-420a-ba6d-0f1f77867eb0
2019-05-07T09:56:47.287-0400 I  STORAGE  [conn53] Registering catalog entry phongo.cursor_bug1274_001 with UUID 6a9426ee-d4d3-420a-ba6d-0f1f77867eb0
2019-05-07T09:56:47.287-0400 I  STORAGE  [conn53] Registering collection object phongo.cursor_bug1274_001 with UUID 6a9426ee-d4d3-420a-ba6d-0f1f77867eb0
2019-05-07T09:56:47.289-0400 I  INDEX    [conn53] index build: done building index _id_ on ns phongo.cursor_bug1274_001
2019-05-07T09:56:47.293-0400 I  NETWORK  [conn53] end connection 127.0.0.1:40904 (0 connections now open)
2019-05-07T09:56:47.784-0400 I  NETWORK  [listener] connection accepted from 127.0.0.1:40906 #54 (1 connection now open)
2019-05-07T09:56:47.803-0400 I  NETWORK  [conn54] end connection 127.0.0.1:40906 (0 connections now open)

In this second log, I have the child sleep five seconds before exiting and disabled the waitpid() in the parent so that iteration happens before the child exits:

find command specifies batchSize: 2
getMore command specifies batchSize: 2
Parent fully iterated cursor for 3 documents
===DONE===
Child exits
 
2019-05-07T09:57:49.242-0400 I  NETWORK  [listener] connection accepted from 127.0.0.1:40910 #56 (1 connection now open)
2019-05-07T09:57:49.262-0400 I  NETWORK  [conn56] received client metadata from 127.0.0.1:40910 conn56: { driver: { name: "mongoc / ext-mongodb:PHP", version: "1.14.0 / 1.6.0alpha2-dev" }, os: { type: "Linux", name: "Ubuntu", version: "18.04", architecture: "x86_64" }, platform: "cfg=0xd15ea8e9 posix=200809 stdc=201112 CC=GCC 7.4.0 CFLAGS="-g  -O0" LDFLAGS="" / PHP 7.2.12" }
2019-05-07T09:57:49.263-0400 I  STORAGE  [conn56] createCollection: phongo.cursor_bug1274_001 with generated UUID: 055f23dc-1ac4-40e9-8242-7e23c1d400fa
2019-05-07T09:57:49.264-0400 I  STORAGE  [conn56] Registering catalog entry phongo.cursor_bug1274_001 with UUID 055f23dc-1ac4-40e9-8242-7e23c1d400fa
2019-05-07T09:57:49.264-0400 I  STORAGE  [conn56] Registering collection object phongo.cursor_bug1274_001 with UUID 055f23dc-1ac4-40e9-8242-7e23c1d400fa
2019-05-07T09:57:49.265-0400 I  INDEX    [conn56] index build: done building index _id_ on ns phongo.cursor_bug1274_001
2019-05-07T09:57:49.270-0400 I  NETWORK  [conn56] end connection 127.0.0.1:40910 (0 connections now open)

In both cases, it looks like resetting an SSL connection from one process affects the other. I'm not sure if this is expected due to SSL interfering with being able to have two client sockets connected to the same server socket (shared file descriptor through the fork). Perhaps SSL is dropping the entire connection once it observes the child hang up?



 Comments   
Comment by Githook User [ 08/Nov/19 ]

Author:

{'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}

Message: CDRIVER-3116 remove unused fn
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/622b3c896627fd33268bd43a08f5195e2542761b

Comment by Githook User [ 08/Nov/19 ]

Author:

{'name': 'Jeremy Mikola', 'username': 'jmikola', 'email': 'jmikola@gmail.com'}

Message: CDRIVER-3116 do not disconnect in mongoc_client_reset
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/4f4268574b63677e8c5e0646c3d169769fb07a50

Comment by Jeremy Mikola [ 22/Oct/19 ]

I tested this using the manager-reset-003.phpt test in mongodb/mongo-php-driver#1027 for PHPC-1274.

The test connects to a server using TLS, inserts three documents, and executes a find with batchSize=2 to obtain a cursor on the result set. It then forks and has the parent process wait on the child, which simply calls mongo_client_reset() and terminates. The parent then continues iteration of its cursor and asserts that a socket exception occurs on the subsequent getMore.

I modified mongoc_client_reset to no longer call mongoc_cluster_disconnect and observed that the test now fails because the parent's getMore succeeds. This confirms our hypothesis that we can guarantee consistent behavior for TLS and non-TLS servers by simply omitting the explicit disconnect. Therefore, I propose the following:

  • Repurpose this ticket to modify mongoc_client_reset to no longer call mongoc_cluster_disconnect. It will only increment the client's generation counter and clear the session pool.
  • mongoc_cursor_destroy will continue to check the client generation before killing a live cursor.
  • mongoc_client_session_destroy will continue to check the client generation before returning a session to the pool.

I've shared additional proposals for PHP in this comment in PHPC-1274.

Comment by Samantha Ritter (Inactive) [ 25/Sep/19 ]

The solution in PHP was going to add PID check logic in each driver operation to ensure we reset before our next application-invoked operation.

That sounds terrible.

Yes, unfortunately it does seem like it's out of our control. The best we can do is probably stress in the documentation that, when using SSL, applications should close their connections before forking.

Comment by Jeremy Mikola [ 24/Sep/19 ]

Is it a reasonable workaround for you to close connections in the parent before forking? We should at the very least update our documentation around mongoc_client_reset() to mention how it fails with SSL.

I don't think this is possible. The entire reason we came up with the design for mongoc_client_reset() was because our drivers don't know when the application will fork. The solution in PHP was going to add PID check logic in each driver operation to ensure we reset before our next application-invoked operation.

That said, I agree that this issue would be better repurposed as a task to update the documentation for mongoc_client_reset() since this appears to be outside of our control.

Comment by Samantha Ritter (Inactive) [ 24/Sep/19 ]

Right, but even though the child and parent aren't intended to both use the socket, they do still "share" it by virtue of the fork happening.  The OpenSSL documentation around this behavior is not forthcoming.  But, from what I gather from threads like this and this, OpenSSL caches session state with its connections, and when you fork, this information will also get duplicated to the child.  So, even if the child process does not intend to use the socket, closing it messes with this shared state.  Perhaps the issue comes from the state being sent to the receiving server.  Without SSL, the child would close their socket and it would do nothing; maybe with SSL, the child causes the state of the shared SSL connection to change on the server, so that subsequent messages sent by the parent fail.  That's my best guess at the moment.

It seems like the best thing to do is to wait until after forking to do the SSL handshake, which means the parent cannot have continuous connections across forking.  I don't think we could alter mongoc_client_reset() to behave correctly here, without leaving dangling sockets around.

 

Is it a reasonable workaround for you to close connections in the parent before forking?  We should at the very least update our documentation around mongoc_client_reset() to mention how it fails with SSL.

Comment by Jeremy Mikola [ 23/Sep/19 ]

samantha.ritter: I was most certainly using OpenSSL when I reported this issue, as that's the default option for Linux (and my build script doesn't customize it). To clarify, PHP is not attempting to share connections across child and parent processes. This is related to the post-fork reset functionality, where we want child processes to abandon sockets that were opened by a parent process.

I think the unexpected behavior was that when SSL is used, resetting the client affects sockets from one process affects the other. The original test presented a case where a parent loses its connection because the child reset after being forked. The intention of opening this issue was to confirm whether something about SSL means we can't rely on the reset behavior from CDRIVER-2857 as we would normally for a non-SSL connection.

Comment by Samantha Ritter (Inactive) [ 23/Sep/19 ]

jmikola why does the PHP driver needs to share SSL connections between a child and parent process? From some research it looks like this is impossible to do with some SSL implementations, including OpenSSL. Can you share what SSL implementation you were running with?

Comment by Jeremy Mikola [ 23/Aug/19 ]

Moving to 1.16 since we missed the boat for 1.15.

Generated at Wed Feb 07 21:17:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.