-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ClusterScalability 2Mar-16Mar
-
200
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
On slow build variants like TSAN debug, the stop_balancer() call in ShardedClusterFixture fails with pymongo.errors._OperationCancelled prematurely despite the server-side balancerStop command eventually succeeding. stop_balancer() creates a MongoClient with the default connectTimeoutMS of 30 seconds but passes maxTimeMS=300000 (5min) to the server command. Under TSAN, localhost round-trip times for hello/heartbeat commands can reach higher than 30 seconds. When the PyMongo SDAM monitor's streaming hello exceeds the 30s socket timeout (derived from connectTimeoutMS), it resets the connection pool with interrupt_connections=True, which cancels the in-flight balancerStop command via _OperationCancelled.
We should make connectTimeoutMS also match maxTimeMS.
- related to
-
SERVER-122537 stop_balancer in between-test hooks lacks retry logic
-
- Closed
-