-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 4.11.1
-
Component/s: async, Connection Mgmt
-
None
-
Python Drivers
-
None
-
None
-
None
-
None
-
None
-
None
Detailed steps to reproduce the problem?
During routine mongdob cluster mainentance on atlas, our fast api async apps running on top of mongodb just freeze, resulting in a complete platform outage.
No trace is present, the server process/threads/coroutines just block.
We are monitoring overall request latency and it seems that for a short period, requests just return super slowly (requests that usually rake 100ms to 400ms now take 20-35 seconds (!!!)). No Pymongo exceptions are thrown whatsoever
Here's our client settings:
"minPoolSize": 32,
"maxPoolSize": 128,
"socketTimeoutMS": 5000
This is not the first time we've seen this behavior. Believing that socketTimeoutMS would be the key to trigger node failure detection, but that never happens.
What are we doing wrong? Which setting do we need to tweak in order to overcome these outages?
Definition of done: what must be done to consider the task complete?
See that pymongo async client properly handles topology changes during server maintenance routines
The exact Python version used, with patch level:
3.12.9
The exact version of PyMongo used, with patch level:
4.11.1
Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.
Atlas , M60 NVMe SSD Cluster residing in AWS Frankfurt Region
The operating system and version (e.g. Windows 7, OSX 10.8, ...)
Debian bullseye (12)
Web framework or asynchronous network library used, if any, with version (e.g. Django 1.7, mod_wsgi 4.3.0, gevent 1.0.1, Tornado 4.0.2, ...)
FastAPI 0.115.11