-
Type: Build Failure
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
Python Drivers
-
Not Needed
Name of Failure:
FAILURE: pymongo.errors.AutoReconnect: localhost:27017: connection closed (configured timeouts: connectTimeoutMS: 20000.0ms) ()
Link to task:
Context of when and why the failure occurred:
Across PyPy3.10 builds we are seeing failures due to connections that are returning with zero bytes read even though the socket should be in good condition.
I was able to reproduce locally with the following setup:
AUTH=auth SSL=ssl TOPOLOGY=sharded_cluster make run-server
pypy3 -m venv .venv
source .venv/bin/activate
pip install -e ".[test]"
AUTH=auth SSL=ssl bash .evergreen/run-tests.sh test/test_transactions_unified.py
I added some debug logs, and found that the socket is checked in and out of the pool up to the point when we try to remove the configured fail point, and then it returns early from a read. I logged the id of the underlying socket on check-in, checkout, command execution, and failure.
command {'saslContinue': 1, 'conversationId': 1, 'payload': Binary(b'c=biws,r=KrhVKs62KirjxlOcPzSxYqN9rxgCMFKyydzFz063tb8=5yL9hky1OGijJ8QRR7K1sp1Q0v/t4Yna,p=pWLGuqeLOHwMzFNz2pTQWHGpmuYT4A8UMYB+CfrHu6A=', 0)} 4571407880 got conn 4571423392 False checking out 4571423392 False command SON([('configureFailPoint', 'failCommand'), ('mode', {'times': 1}), ('data', {'failCommands': ['update'], 'closeConnection': True}), ('lsid', {'id': Binary(b'\x11\x92\xdb\xf3LgJs\x924\x91P|8\xe0\x8a', 4)}), ('$clusterTime', {'clusterTime': Timestamp(1734403140, 26), 'signature': {'hash': b'\x1f\xfe7V.\xed%\xec\x84\xfd\xe7\xca\x12\x9b\x12M\xe2\x03=\xcf', 'keyId': 7449023134507728918}})]) 4571407880 checking in 4571423392 False got conn 4571258600 False checking out 4571258600 False come on <ssl.SSLSocket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 60522)> 4571258600 True checking in 4571258600 True got conn 4571423392 False checking out 4571423392 False command {'configureFailPoint': 'failCommand', 'mode': 'off', 'lsid': {'id': Binary(b'\x11\x92\xdb\xf3LgJs\x924\x91P|8\xe0\x8a', 4)}, '$clusterTime': {'clusterTime': Timestamp(1734403140, 26), 'signature': {'hash': b'\x1f\xfe7V.\xed%\xec\x84\xfd\xe7\xca\x12\x9b\x12M\xe2\x03=\xcf', 'keyId': 7449023134507728918}}} 4571407880 failing here: <ssl.SSLSocket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 60524), raddr=('127.0.0.1', 27017)> 4571423392 True checking in 4571423392 True
Stack trace:
[2024/12/09 15:22:23.347] FAILURE: pymongo.errors.AutoReconnect: localhost:27017: connection closed (configured timeouts: connectTimeoutMS: 20000.0ms) () [2024/12/09 15:22:23.347] self = Connection(<ssl.SSLSocket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>) CLOSED at 96743112 [2024/12/09 15:22:23.347] dbname = 'admin' [2024/12/09 15:22:23.347] spec = {'$db': 'admin', 'configureFailPoint': 'failCommand', 'lsid': {'id': Binary(b'\xc6\xe0\xf0\x19I\xacJ?\xb4|\xd6?\xbbz\xcc\xb1', 4)}, 'mode': 'off'} [2024/12/09 15:22:23.347] read_preference = Primary() [2024/12/09 15:22:23.347] codec_options = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_e...ne, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME) [2024/12/09 15:22:23.347] check = True, allowable_errors = None, read_concern = None, write_concern = None [2024/12/09 15:22:23.347] parse_write_concern_error = False, collation = None [2024/12/09 15:22:23.347] session = <pymongo.synchronous.client_session.ClientSession object at 0x0000000017548720> [2024/12/09 15:22:23.347] client = MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=None, tls=True, tl...en/x509gen/client.pem', tlscafile='/data/mci/8c84a16d759062141ee089e39efa6472/drivers-tools/.evergreen/x509gen/ca.pem') [2024/12/09 15:22:23.347] retryable_write = False, publish_events = True, user_fields = None [2024/12/09 15:22:23.347] exhaust_allowed = False [2024/12/09 15:22:23.347] @_handle_reauth [2024/12/09 15:22:23.347] def command( [2024/12/09 15:22:23.347] self, [2024/12/09 15:22:23.347] dbname: str, [2024/12/09 15:22:23.347] spec: MutableMapping[str, Any], [2024/12/09 15:22:23.347] read_preference: _ServerMode = ReadPreference.PRIMARY, [2024/12/09 15:22:23.347] codec_options: CodecOptions = DEFAULT_CODEC_OPTIONS, [2024/12/09 15:22:23.347] check: bool = True, [2024/12/09 15:22:23.347] allowable_errors: Optional[Sequence[Union[str, int]]] = None, [2024/12/09 15:22:23.347] read_concern: Optional[ReadConcern] = None, [2024/12/09 15:22:23.347] write_concern: Optional[WriteConcern] = None, [2024/12/09 15:22:23.347] parse_write_concern_error: bool = False, [2024/12/09 15:22:23.347] collation: Optional[_CollationIn] = None, [2024/12/09 15:22:23.347] session: Optional[ClientSession] = None, [2024/12/09 15:22:23.347] client: Optional[MongoClient] = None, [2024/12/09 15:22:23.347] retryable_write: bool = False, [2024/12/09 15:22:23.347] publish_events: bool = True, [2024/12/09 15:22:23.347] user_fields: Optional[Mapping[str, Any]] = None, [2024/12/09 15:22:23.347] exhaust_allowed: bool = False, [2024/12/09 15:22:23.347] ) -> dict[str, Any]: [2024/12/09 15:22:23.347] """Execute a command or raise an error. [2024/12/09 15:22:23.347] [2024/12/09 15:22:23.347] :param dbname: name of the database on which to run the command [2024/12/09 15:22:23.347] :param spec: a command document as a dict, SON, or mapping object [2024/12/09 15:22:23.347] :param read_preference: a read preference [2024/12/09 15:22:23.347] :param codec_options: a CodecOptions instance [2024/12/09 15:22:23.347] :param check: raise OperationFailure if there are errors [2024/12/09 15:22:23.347] :param allowable_errors: errors to ignore if `check` is True [2024/12/09 15:22:23.347] :param read_concern: The read concern for this command. [2024/12/09 15:22:23.347] :param write_concern: The write concern for this command. [2024/12/09 15:22:23.347] :param parse_write_concern_error: Whether to parse the [2024/12/09 15:22:23.347] ``writeConcernError`` field in the command response. [2024/12/09 15:22:23.347] :param collation: The collation for this command. [2024/12/09 15:22:23.347] :param session: optional ClientSession instance. [2024/12/09 15:22:23.347] :param client: optional MongoClient for gossipping $clusterTime. [2024/12/09 15:22:23.347] :param retryable_write: True if this command is a retryable write. [2024/12/09 15:22:23.347] :param publish_events: Should we publish events for this command? [2024/12/09 15:22:23.347] :param user_fields: Response fields that should be decoded [2024/12/09 15:22:23.347] using the TypeDecoders from codec_options, passed to [2024/12/09 15:22:23.347] bson._decode_all_selective. [2024/12/09 15:22:23.347] """ [2024/12/09 15:22:23.347] self.validate_session(client, session) [2024/12/09 15:22:23.347] session = _validate_session_write_concern(session, write_concern) [2024/12/09 15:22:23.347] [2024/12/09 15:22:23.347] # Ensure command name remains in first place. [2024/12/09 15:22:23.347] if not isinstance(spec, ORDERED_TYPES): # type:ignore[arg-type] [2024/12/09 15:22:23.347] spec = dict(spec) [2024/12/09 15:22:23.347] [2024/12/09 15:22:23.347] if not (write_concern is None or write_concern.acknowledged or collation is None): [2024/12/09 15:22:23.347] raise ConfigurationError("Collation is unsupported for unacknowledged writes.") [2024/12/09 15:22:23.347] [2024/12/09 15:22:23.347] self.add_server_api(spec) [2024/12/09 15:22:23.347] if session: [2024/12/09 15:22:23.347] session._apply_to(spec, retryable_write, read_preference, self) [2024/12/09 15:22:23.347] self.send_cluster_time(spec, session, client) [2024/12/09 15:22:23.347] listeners = self.listeners if publish_events else None [2024/12/09 15:22:23.347] unacknowledged = bool(write_concern and not write_concern.acknowledged) [2024/12/09 15:22:23.347] if self.op_msg_enabled: [2024/12/09 15:22:23.347] self._raise_if_not_writable(unacknowledged) [2024/12/09 15:22:23.347] try: [2024/12/09 15:22:23.347] > return command( [2024/12/09 15:22:23.347] self, [2024/12/09 15:22:23.347] dbname, [2024/12/09 15:22:23.347] spec, [2024/12/09 15:22:23.347] self.is_mongos, [2024/12/09 15:22:23.347] read_preference, [2024/12/09 15:22:23.347] codec_options, [2024/12/09 15:22:23.347] session, [2024/12/09 15:22:23.347] client, [2024/12/09 15:22:23.347] check, [2024/12/09 15:22:23.347] allowable_errors, [2024/12/09 15:22:23.347] self.address, [2024/12/09 15:22:23.347] listeners, [2024/12/09 15:22:23.347] self.max_bson_size, [2024/12/09 15:22:23.347] read_concern, [2024/12/09 15:22:23.347] parse_write_concern_error=parse_write_concern_error, [2024/12/09 15:22:23.347] collation=collation, [2024/12/09 15:22:23.347] compression_ctx=self.compression_context, [2024/12/09 15:22:23.347] use_op_msg=self.op_msg_enabled, [2024/12/09 15:22:23.347] unacknowledged=unacknowledged, [2024/12/09 15:22:23.347] user_fields=user_fields, [2024/12/09 15:22:23.347] exhaust_allowed=exhaust_allowed, [2024/12/09 15:22:23.347] write_concern=write_concern, [2024/12/09 15:22:23.347] ) [2024/12/09 15:22:23.347] pymongo/synchronous/pool.py:536: [2024/12/09 15:22:23.347] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2024/12/09 15:22:23.347] pymongo/synchronous/network.py:203: in command [2024/12/09 15:22:23.347] reply = receive_message(conn, request_id) [2024/12/09 15:22:23.347] pymongo/synchronous/network.py:315: in receive_message [2024/12/09 15:22:23.347] length, _, response_to, op_code = _UNPACK_HEADER(receive_data(conn, 16, deadline)) [2024/12/09 15:22:23.347] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2024/12/09 15:22:23.347] conn = Connection(<ssl.SSLSocket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>) CLOSED at 96743112 [2024/12/09 15:22:23.347] length = 16, deadline = None [2024/12/09 15:22:23.347] def receive_data(conn: Connection, length: int, deadline: Optional[float]) -> memoryview: [2024/12/09 15:22:23.347] buf = bytearray(length) [2024/12/09 15:22:23.347] mv = memoryview(buf) [2024/12/09 15:22:23.347] bytes_read = 0 [2024/12/09 15:22:23.347] # To support cancelling a network read, we shorten the socket timeout and [2024/12/09 15:22:23.347] # check for the cancellation signal after each timeout. Alternatively we [2024/12/09 15:22:23.347] # could close the socket but that does not reliably cancel recv() calls [2024/12/09 15:22:23.347] # on all OSes. [2024/12/09 15:22:23.347] orig_timeout = conn.conn.gettimeout() [2024/12/09 15:22:23.347] try: [2024/12/09 15:22:23.347] while bytes_read < length: [2024/12/09 15:22:23.347] if deadline is not None: [2024/12/09 15:22:23.347] # CSOT: Update timeout. When the timeout has expired perform one [2024/12/09 15:22:23.347] # final non-blocking recv. This helps avoid spurious timeouts when [2024/12/09 15:22:23.347] # the response is actually already buffered on the client. [2024/12/09 15:22:23.347] short_timeout = min(max(deadline - time.monotonic(), 0), _POLL_TIMEOUT) [2024/12/09 15:22:23.347] else: [2024/12/09 15:22:23.347] short_timeout = _POLL_TIMEOUT [2024/12/09 15:22:23.347] conn.set_conn_timeout(short_timeout) [2024/12/09 15:22:23.347] try: [2024/12/09 15:22:23.347] chunk_length = conn.conn.recv_into(mv[bytes_read:]) [2024/12/09 15:22:23.347] except BLOCKING_IO_ERRORS: [2024/12/09 15:22:23.347] if conn.cancel_context.cancelled: [2024/12/09 15:22:23.347] raise _OperationCancelled("operation cancelled") from None [2024/12/09 15:22:23.347] # We reached the true deadline. [2024/12/09 15:22:23.347] raise socket.timeout("timed out") from None [2024/12/09 15:22:23.347] except socket.timeout: [2024/12/09 15:22:23.347] if conn.cancel_context.cancelled: [2024/12/09 15:22:23.347] raise _OperationCancelled("operation cancelled") from None [2024/12/09 15:22:23.347] continue [2024/12/09 15:22:23.347] except OSError as exc: [2024/12/09 15:22:23.347] if conn.cancel_context.cancelled: [2024/12/09 15:22:23.347] raise _OperationCancelled("operation cancelled") from None [2024/12/09 15:22:23.347] if _errno_from_exception(exc) == errno.EINTR: [2024/12/09 15:22:23.347] continue [2024/12/09 15:22:23.347] raise [2024/12/09 15:22:23.347] if chunk_length == 0: [2024/12/09 15:22:23.347] > raise OSError("connection closed") [2024/12/09 15:22:23.347] E OSError: connection closed [2024/12/09 15:22:23.347] pymongo/network_layer.py:356: OSError [2024/12/09 15:22:23.347] The above exception was the direct cause of the following exception: [2024/12/09 15:22:23.347] self = <test.test_monitoring.TestCommandMonitoring testMethod=test_bulk_write_command_network_error> [2024/12/09 15:22:23.347] @client_context.require_failCommand_fail_point [2024/12/09 15:22:23.347] def test_bulk_write_command_network_error(self): [2024/12/09 15:22:23.347] coll = self.client.pymongo_test.test [2024/12/09 15:22:23.347] self.listener.reset() [2024/12/09 15:22:23.347] [2024/12/09 15:22:23.347] insert_network_error = { [2024/12/09 15:22:23.347] "configureFailPoint": "failCommand", [2024/12/09 15:22:23.347] "mode": {"times": 1}, [2024/12/09 15:22:23.347] "data": { [2024/12/09 15:22:23.347] "failCommands": ["insert"], [2024/12/09 15:22:23.347] "closeConnection": True, [2024/12/09 15:22:23.347] }, [2024/12/09 15:22:23.347] } [2024/12/09 15:22:23.347] > with self.fail_point(insert_network_error): [2024/12/09 15:22:23.347] test/test_monitoring.py:990: [2024/12/09 15:22:23.347] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2024/12/09 15:22:23.347] /opt/python/pypy3.10/lib/pypy3.10/contextlib.py:142: in __exit__ [2024/12/09 15:22:23.347] next(self.gen) [2024/12/09 15:22:23.347] test/__init__.py:892: in fail_point [2024/12/09 15:22:23.347] client_context.client.admin.command( [2024/12/09 15:22:23.347] pymongo/_csot.py:119: in csot_wrapper [2024/12/09 15:22:23.347] return func(self, *args, **kwargs) [2024/12/09 15:22:23.347] pymongo/synchronous/database.py:930: in command [2024/12/09 15:22:23.347] return self._command( [2024/12/09 15:22:23.347] pymongo/synchronous/database.py:770: in _command [2024/12/09 15:22:23.347] return conn.command( [2024/12/09 15:22:23.347] pymongo/synchronous/helpers.py:45: in inner [2024/12/09 15:22:23.347] return func(*args, **kwargs) [2024/12/09 15:22:23.347] pymongo/synchronous/pool.py:564: in command [2024/12/09 15:22:23.347] self._raise_connection_failure(error) [2024/12/09 15:22:23.347] pymongo/synchronous/pool.py:763: in _raise_connection_failure [2024/12/09 15:22:23.347] _raise_connection_failure(self.address, error, timeout_details=details) [2024/12/09 15:22:23.347] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2024/12/09 15:22:23.347] address = ('localhost', 27017), error = OSError('connection closed') [2024/12/09 15:22:23.347] msg_prefix = None, timeout_details = {'connectTimeoutMS': 20000.0} [2024/12/09 15:22:23.347] def _raise_connection_failure( [2024/12/09 15:22:23.347] address: Any, [2024/12/09 15:22:23.347] error: Exception, [2024/12/09 15:22:23.347] msg_prefix: Optional[str] = None, [2024/12/09 15:22:23.347] timeout_details: Optional[dict[str, float]] = None, [2024/12/09 15:22:23.347] ) -> NoReturn: [2024/12/09 15:22:23.347] """Convert a socket.error to ConnectionFailure and raise it.""" [2024/12/09 15:22:23.347] host, port = address [2024/12/09 15:22:23.347] # If connecting to a Unix socket, port will be None. [2024/12/09 15:22:23.347] if port is not None: [2024/12/09 15:22:23.347] msg = "%s:%d: %s" % (host, port, error) [2024/12/09 15:22:23.347] else: [2024/12/09 15:22:23.347] msg = f"{host}: {error}" [2024/12/09 15:22:23.347] if msg_prefix: [2024/12/09 15:22:23.347] msg = msg_prefix + msg [2024/12/09 15:22:23.347] if "configured timeouts" not in msg: [2024/12/09 15:22:23.347] msg += format_timeout_details(timeout_details) [2024/12/09 15:22:23.347] if isinstance(error, socket.timeout): [2024/12/09 15:22:23.347] raise NetworkTimeout(msg) from error [2024/12/09 15:22:23.347] elif isinstance(error, SSLError) and "timed out" in str(error): [2024/12/09 15:22:23.347] # Eventlet does not distinguish TLS network timeouts from other [2024/12/09 15:22:23.347] # SSLErrors (https://github.com/eventlet/eventlet/issues/692). [2024/12/09 15:22:23.347] # Luckily, we can work around this limitation because the phrase [2024/12/09 15:22:23.347] # 'timed out' appears in all the timeout related SSLErrors raised. [2024/12/09 15:22:23.347] raise NetworkTimeout(msg) from error [2024/12/09 15:22:23.347] else: [2024/12/09 15:22:23.347] > raise AutoReconnect(msg) from error [2024/12/09 15:22:23.347] E pymongo.errors.AutoReconnect: localhost:27017: connection closed (configured timeouts: connectTimeoutMS: 20000.0ms) [2024/12/09 15:22:23.347] pymongo/synchronous/pool.py:211: AutoReconnect
- is caused by
-
PYTHON-4292 Perf regression when reading large SSL responses from multiple threads
- Closed
- related to
-
PYTHON-4982 Test Failure - network errors running configureFailPoint
- Backlog