-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.3, 4.0
-
Component/s: Connection Mgmt
-
None
-
Environment:Ubuntu 16.04, Docker and Kubernetes
Hello, this is a bit of a long story, I'm sorry for this but I don't know how to sumarize it.
h2. Context
I tried to run mongos processes in kubernetes as a deployment with a headless service on top of it. The behavior of headless service is that the name is resolved to a set of A records, each one pointing to a kubernetes pod IP.
Know, from MongoDB docs I learned that mongos processes actually keep some state and that a cursor should always hit the same mongos.
My services instantiate a pymongo driver in main and use it for all requests with the handling of requests being done in parallel.
h2. The problem
The problem appears to be here: https://github.com/mongodb/mongo-python-driver/blob/3.0.3/pymongo/pool.py#L365
Because the A records are not returned in the same order (I think it's becasue it does a round robin) there will be sockets opened to different IPs.
Now, because random is used here: https://github.com/mongodb/mongo-python-driver/blob/3.0.3/pymongo/topology.py#L119 and a socket is selected every time a thread wants to send a message to mongo it can happen that the GetMore request goes to a different server and an error is raised.
h2. Proposed solution
From my point of view, the problem would be fixed if the DNS resolution is done when the Cursor is initialised and then only that IP address is used to get sockets for that cursor.
I would be happy to contribute with the code after we agree on a solution.