-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.2
-
Component/s: None
-
None
Bug in (not yet released) PyMongo 2.2's code for handling socket errors.
The good: when a Connection or ReplicaSetConnection gets a socket error, it calls self.disconnect(), which calls Pool.reset(), which closes any idle sockets in the Pool and also closes the current thread's request socket, if this thread is in a request. (Which it usually is because auto_start_request defaults to True.)
The bad: Other threads that are alive when Pool.reset() is called keep their request sockets. They'll continue to use those sockets as long as they live. If those sockets are still good, then that's fine. But if reset() was called because of a condition that killed all the sockets, e.g. the primary stepped down and closed its connections, or the server was restarted, then each thread's next operation will raise an AutoReconnect exception. If a thread terminates or calls end_request() without using its request socket after Pool.reset(), then its request socket will be returned to the pool as an idle socket. Then the next thread that needs a socket will get the bad one, and raise AutoReconnect when it first uses it.
If your application has lots of threads that rarely access Mongo, and you leave auto_start_request defaulted to True, then a network event like a stepdown will cause AutoReconnects on various threads for an indeterminate period into the future. Connection.disconnect() doesn't have the desired behavior that the current network event will cause one AutoReconnect on the current thread, and recreate all sockets afresh.
The old PyMongo 2.1 behavior was that Connection.disconnect() deleted the whole Pool instance and made a new one, rather than calling Pool.reset(). Revert to this behavior. In that case, Pool.reset() might be removed.