-
Type:
Build Failure
-
Resolution: Fixed
-
Priority:
Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
Python Drivers
-
Not Needed
-
None
-
None
-
None
-
None
-
None
-
None
Name of Failure:
test.asynchronous.test_retryable_reads.TestRetryableReads.test_03_01_retryable_reads_caused_by_overload_errors_are_retried_on_a_different_replicaset_server_when_one_is_available_and_overload_retargeting_is_enabled
Link to task:
Context of when and why the failure occurred:
The overload retargeting tests do not ensure that all nodes, including secondaries, are discovered before the test operation begins. Since the linked test requires that at least one secondary be discovered in order to succeed, this creates a race condition.
Stack trace:
[2026/04/15 12:40:23.833] FAILURE: assert ('localhost', 27018) != ('localhost', 27018) [2026/04/15 12:40:23.833] + where ('localhost', 27018) = <CommandFailedEvent ('localhost', 27018) db: 't', command: 'find', operation_id: 1978461562, duration_micros: 412, failure: {'errorLabels': ['RetryableError', 'SystemOverloadedError'], 'ok': 0.0, 'errmsg': "Failing command via 'failCommand' failpoint", 'code': 6, 'codeName': 'HostUnreachable', '$clusterTime': {'clusterTime': Timestamp(1776281717, 20), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1776281717, 20)}, service_id: None, server_connection_id: 3871>.connection_id [2026/04/15 12:40:23.833] + and ('localhost', 27018) = <CommandSucceededEvent ('localhost', 27018) db: 't', command: 'find', operation_id: 11808384, duration_micros: 458, service_id: None, server_connection_id: 3871>.connection_id () [2026/04/15 12:40:23.833] self = <test.asynchronous.test_retryable_reads.TestRetryableReads testMethod=test_03_01_retryable_reads_caused_by_overload_errors_are_retried_on_a_different_replicaset_server_when_one_is_available_and_overload_retargeting_is_enabled> [2026/04/15 12:40:23.833] @async_client_context.require_replica_set [2026/04/15 12:40:23.833] @async_client_context.require_secondaries_count(1) [2026/04/15 12:40:23.833] @async_client_context.require_failCommand_fail_point [2026/04/15 12:40:23.833] @async_client_context.require_version_min(4, 4, 0) [2026/04/15 12:40:23.833] async def test_03_01_retryable_reads_caused_by_overload_errors_are_retried_on_a_different_replicaset_server_when_one_is_available_and_overload_retargeting_is_enabled( [2026/04/15 12:40:23.833] self [2026/04/15 12:40:23.833] ): [2026/04/15 12:40:23.833] listener = OvertCommandListener() [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 1. Create a client `client` with `retryReads=true`, `readPreference=primaryPreferred`, `enableOverloadRetargeting=True`, and command event monitoring enabled. [2026/04/15 12:40:23.833] client = await self.async_rs_or_single_client( [2026/04/15 12:40:23.833] event_listeners=[listener], [2026/04/15 12:40:23.833] retryReads=True, [2026/04/15 12:40:23.833] readPreference="primaryPreferred", [2026/04/15 12:40:23.833] enableOverloadRetargeting=True, [2026/04/15 12:40:23.833] ) [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 2. Configure a fail point with the RetryableError and SystemOverloadedError error labels. [2026/04/15 12:40:23.833] command_args = { [2026/04/15 12:40:23.833] "configureFailPoint": "failCommand", [2026/04/15 12:40:23.833] "mode": {"times": 1}, [2026/04/15 12:40:23.833] "data": { [2026/04/15 12:40:23.833] "failCommands": ["find"], [2026/04/15 12:40:23.833] "errorLabels": ["RetryableError", "SystemOverloadedError"], [2026/04/15 12:40:23.833] "errorCode": 6, [2026/04/15 12:40:23.833] }, [2026/04/15 12:40:23.833] } [2026/04/15 12:40:23.833] await async_set_fail_point(client, command_args) [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 3. Reset the command event monitor to clear the fail point command from its stored events. [2026/04/15 12:40:23.833] listener.reset() [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 4. Execute a `find` command with `client`. [2026/04/15 12:40:23.833] await client.t.t.find_one({}) [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 5. Assert that one failed command event and one successful command event occurred. [2026/04/15 12:40:23.833] self.assertEqual(len(listener.failed_events), 1) [2026/04/15 12:40:23.833] self.assertEqual(len(listener.succeeded_events), 1) [2026/04/15 12:40:23.833] [2026/04/15 12:40:23.833] # 6. Assert that both events occurred on different servers. [2026/04/15 12:40:23.833] > assert listener.failed_events[0].connection_id != listener.succeeded_events[0].connection_id [2026/04/15 12:40:23.833] E assert ('localhost', 27018) != ('localhost', 27018) [2026/04/15 12:40:23.833] E + where ('localhost', 27018) = <CommandFailedEvent ('localhost', 27018) db: 't', command: 'find', operation_id: 1978461562, duration_micros: 412, failure: {'errorLabels': ['RetryableError', 'SystemOverloadedError'], 'ok': 0.0, 'errmsg': "Failing command via 'failCommand' failpoint", 'code': 6, 'codeName': 'HostUnreachable', '$clusterTime': {'clusterTime': Timestamp(1776281717, 20), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1776281717, 20)}, service_id: None, server_connection_id: 3871>.connection_id [2026/04/15 12:40:23.833] E + and ('localhost', 27018) = <CommandSucceededEvent ('localhost', 27018) db: 't', command: 'find', operation_id: 11808384, duration_micros: 458, service_id: None, server_connection_id: 3871>.connection_id [2026/04/15 12:40:23.833] test/asynchronous/test_retryable_reads.py:321: AssertionError