Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Sprint:
ClusterScalability 2Mar-16Mar
Linked BF Score:
200
Story Points:
1
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

On slow build variants like TSAN debug, the stop_balancer() call in ShardedClusterFixture fails with pymongo.errors._OperationCancelled prematurely despite the server-side balancerStop command eventually succeeding. stop_balancer() creates a MongoClient with the default connectTimeoutMS of 30 seconds but passes maxTimeMS=300000 (5min) to the server command. Under TSAN, localhost round-trip times for hello/heartbeat commands can reach higher than 30 seconds. When the PyMongo SDAM monitor's streaming hello exceeds the 30s socket timeout (derived from connectTimeoutMS), it resets the connection pool with interrupt_connections=True, which cancels the in-flight balancerStop command via _OperationCancelled.

We should make connectTimeoutMS also match maxTimeMS.

related to

SERVER-122537 stop_balancer in between-test hooks lacks retry logic

Closed

Assignee:: Abdul Qadeer
Reporter:: Abdul Qadeer
Participants:: Abdul Qadeer, Githook User
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Mar 03 2026 11:27:47 PM UTC
Updated:: Mar 24 2026 04:59:02 PM UTC
Resolved:: Mar 04 2026 02:18:20 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates