Loading...

Type: Bug
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: async
Labels:
- flaky-tests
- greenerbuild

Quarter:
- FY26Q1
Confidence Status:
None

Assigned Teams:

Python Drivers

Documentation Changes Summary:

Hide

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?

Show
1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

There are two flakey tests in discovery_and_monitoring.py:

Error returned from connection pool clear with interruptInUseConnections=true is retryable
Error returned from connection pool clear with interruptInUseConnections=true is retryable for write

The test will fail with an `_OperationCancelled` error. Stack trace below:

 [2025/02/27 10:01:42.179] FAILURE: pymongo.errors._OperationCancelled: operation cancelled ()
 [2025/02/27 10:01:42.179] self = <test.asynchronous.test_discovery_and_monitoring.TestUnifiedInterruptInUsePoolClear testMethod=test_Error_returned_from_connection_pool_clear_with_interruptInUseConnections=true_is_retryable>
 [2025/02/27 10:01:42.179]     async def test_case(self):
 [2025/02/27 10:01:42.179] >       await self.run_scenario(spec)
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1469: 
 [2025/02/27 10:01:42.179] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1411: in run_scenario
 [2025/02/27 10:01:42.179]     await self._run_scenario(spec, uri)
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1446: in _run_scenario
 [2025/02/27 10:01:42.179]     await self.run_operations(spec["operations"])
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1245: in run_operations
 [2025/02/27 10:01:42.179]     await self.run_special_operation(op)
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1238: in run_special_operation
 [2025/02/27 10:01:42.179]     await method(spec["arguments"])
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:1188: in _testOperation_waitForThread
 [2025/02/27 10:01:42.179]     raise thread.exc
 [2025/02/27 10:01:42.179] test\asynchronous\utils_spec_runner.py:85: in run
 [2025/02/27 10:01:42.179]     await work()
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:982: in run_entity_operation
 [2025/02/27 10:01:42.179]     result = await cmd(**dict(arguments))
 [2025/02/27 10:01:42.179] test\asynchronous\unified_format.py:785: in _collectionOperation_find
 [2025/02/27 10:01:42.179]     return await find_cursor.to_list()
 [2025/02/27 10:01:42.179] pymongo\_csot.py:109: in csot_wrapper
 [2025/02/27 10:01:42.179]     return await func(self, *args, **kwargs)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\cursor.py:1318: in to_list
 [2025/02/27 10:01:42.179]     if not await self._next_batch(res, remaining):
 [2025/02/27 10:01:42.179] pymongo\asynchronous\cursor.py:1274: in _next_batch
 [2025/02/27 10:01:42.179]     if len(self._data) or await self._refresh():
 [2025/02/27 10:01:42.179] pymongo\asynchronous\cursor.py:1210: in _refresh
 [2025/02/27 10:01:42.179]     await self._send_message(q)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\cursor.py:1104: in _send_message
 [2025/02/27 10:01:42.179]     response = await client._run_operation(
 [2025/02/27 10:01:42.179] pymongo\_csot.py:109: in csot_wrapper
 [2025/02/27 10:01:42.179]     return await func(self, *args, **kwargs)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:1763: in _run_operation
 [2025/02/27 10:01:42.179]     return await self._retryable_read(
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:1872: in _retryable_read
 [2025/02/27 10:01:42.179]     return await self._retry_internal(
 [2025/02/27 10:01:42.179] pymongo\_csot.py:109: in csot_wrapper
 [2025/02/27 10:01:42.179]     return await func(self, *args, **kwargs)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:1828: in _retry_internal
 [2025/02/27 10:01:42.179]     return await _ClientConnectionRetryable(
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:2583: in run
 [2025/02/27 10:01:42.179]     return await self._read() if self._is_read else await self._write()
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:2739: in _read
 [2025/02/27 10:01:42.179]     return await self._func(self._session, self._server, conn, read_pref)  # type: ignore
 [2025/02/27 10:01:42.179] pymongo\asynchronous\mongo_client.py:1754: in _cmd
 [2025/02/27 10:01:42.179]     return await server.run_operation(
 [2025/02/27 10:01:42.179] pymongo\asynchronous\helpers.py:47: in inner
 [2025/02/27 10:01:42.179]     return await func(*args, **kwargs)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\server.py:208: in run_operation
 [2025/02/27 10:01:42.179]     reply = await conn.receive_message(request_id)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\pool.py:595: in receive_message
 [2025/02/27 10:01:42.179]     self._raise_connection_failure(error)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\pool.py:591: in receive_message
 [2025/02/27 10:01:42.179]     return await receive_message(self, request_id, self.max_message_size)
 [2025/02/27 10:01:42.179] pymongo\asynchronous\network.py:324: in receive_message
 [2025/02/27 10:01:42.179]     length, _, response_to, op_code = _UNPACK_HEADER(await async_receive_data(conn, 16, deadline))
 [2025/02/27 10:01:42.179] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 [2025/02/27 10:01:42.179] conn = AsyncConnection(<ssl.SSLSocket [closed] fd=-1, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>) CLOSED at 2045221770000
 [2025/02/27 10:01:42.179] length = 16, deadline = None
 [2025/02/27 10:01:42.179]     async def async_receive_data(
 [2025/02/27 10:01:42.179]         conn: AsyncConnection, length: int, deadline: Optional[float]
 [2025/02/27 10:01:42.179]     ) -> memoryview:
 [2025/02/27 10:01:42.179]         sock = conn.conn
 [2025/02/27 10:01:42.179]         sock_timeout = sock.gettimeout()
 [2025/02/27 10:01:42.179]         timeout: Optional[Union[float, int]]
 [2025/02/27 10:01:42.179]         if deadline:
 [2025/02/27 10:01:42.179]             # When the timeout has expired perform one final check to
 [2025/02/27 10:01:42.179]             # see if the socket is readable. This helps avoid spurious
 [2025/02/27 10:01:42.179]             # timeouts on AWS Lambda and other FaaS environments.
 [2025/02/27 10:01:42.179]             timeout = max(deadline - time.monotonic(), 0)
 [2025/02/27 10:01:42.179]         else:
 [2025/02/27 10:01:42.179]             timeout = sock_timeout
 [2025/02/27 10:01:42.179]     
 [2025/02/27 10:01:42.179]         sock.settimeout(0.0)
 [2025/02/27 10:01:42.179]         loop = asyncio.get_running_loop()
 [2025/02/27 10:01:42.179]         cancellation_task = create_task(_poll_cancellation(conn))
 [2025/02/27 10:01:42.179]         try:
 [2025/02/27 10:01:42.179]             if _HAVE_SSL and isinstance(sock, (SSLSocket, _sslConn)):
 [2025/02/27 10:01:42.179]                 read_task = create_task(_async_receive_ssl(sock, length, loop))  # type: ignore[arg-type]
 [2025/02/27 10:01:42.179]             else:
 [2025/02/27 10:01:42.179]                 read_task = create_task(_async_receive(sock, length, loop))  # type: ignore[arg-type]
 [2025/02/27 10:01:42.179]             tasks = [read_task, cancellation_task]
 [2025/02/27 10:01:42.179]             try:
 [2025/02/27 10:01:42.179]                 done, pending = await asyncio.wait(
 [2025/02/27 10:01:42.179]                     tasks, timeout=timeout, return_when=asyncio.FIRST_COMPLETED
 [2025/02/27 10:01:42.179]                 )
 [2025/02/27 10:01:42.179]                 for task in pending:
 [2025/02/27 10:01:42.179]                     task.cancel()
 [2025/02/27 10:01:42.179]                 if pending:
 [2025/02/27 10:01:42.179]                     await asyncio.wait(pending)
 [2025/02/27 10:01:42.179]                 if len(done) == 0:
 [2025/02/27 10:01:42.179]                     raise socket.timeout("timed out")
 [2025/02/27 10:01:42.179]                 if read_task in done:
 [2025/02/27 10:01:42.179]                     return read_task.result()
 [2025/02/27 10:01:42.179]                 print("RIGHT BEFORE OPERATION CANCELLED IS RAISED")
 [2025/02/27 10:01:42.179]                 print(cancellation_task in done)
 [2025/02/27 10:01:42.179] >               raise _OperationCancelled("operation cancelled")
 [2025/02/27 10:01:42.179] E               pymongo.errors._OperationCancelled: operation cancelled
 [2025/02/27 10:01:42.179] pymongo\network_layer.py:287: _OperationCancelled

The test expects the following order of events (eliminating some events at the beginning that seem to be behaving properly): configure fail point, start find command, find command fails, start find command again (this is a retry), find command succeeds.

First off, these tests have only been noticed to be flakey on Windows python3.9 on later server (mainly >7.0) versions.

The first find command does successfully fail, however the second find also fails with the same _OperationCancelled Error.
Prior to the _OperationCancelled, there is a heartbeat failure that causes the pool to be cleared and thus cancelling the operation. The test expected all the failPoints to be consumed prior to the retry. However, lowering the failPoint `times` still resulted in the same error.

Notably, we've noticed that there was a heartbeat failure, followed by a heartbeat success, and then another heartbeat failure on the same connection.

Additionally, I've attached the some log files

output.log is the original test log
output_failed2.log is the test logs when failPoint is configured to fail twice.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

output_failed2.log
371 kB
Feb 28 2025 07:21:37 PM UTC
output.log
459 kB
Feb 28 2025 07:21:14 PM UTC

is related to

PYTHON-4493 Use asyncio protocols instead of sockets for network IO

Closed

related to

PYTHON-5174 Test failure - test_Driver_extends_timeout_while_streaming

Investigating

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates