Priority: Major - P3
Resolution: Works as Designed
Affects Version/s: 3.4, 3.6
Fix Version/s: None
Component/s: Connection Management
Environment:Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018
uWSGI 2.0.14 (64bit)
I have a web application developped with Flask, hosted on www.pythonanywhere.com servers (they rely on AWS). It is managed by a uWSGI server (with options --enable-threads and --lazy-app disabled, and I cannot change options). My application interacts with a MongoDB database hosted by MongoLab, using pymongo (mainly 3.4 but 3.6 raises the same issue). The python version is 2.7.6 because pythonanywhere does not have a more recent release.
I create a Mongo client with connect=False option to prevent me from pymongo not being fork-safe, and maxPoolSize=1 as pythonanywhere does not allow python threading (other parameters are default values); it is created once at application startup and then imported when defining my routes.
I am facing random repeated hanging (though not systematical, maybe once every 2-3 times) in pymongo at the very first request that I send to my application, and then my application times out and cannot process any other request so that it has to restart.
pythonanywhere.com gives me very little to monitor what happens, and their log files show no anomaly, but adding some print lines in my code, and then in pymongo code I could see that hanging happens when opening the _topology attribute of the client: the open method calls _ensure_opened which itself calls an open method on every server in _servers attribute at the first pass; this server's open method is transfered to a Monitor object. I do not understand what is exactly intended and done in this part of your code, nor can I understand how it works, but apparently there is a PeriodicExecutor object which, on open, creates a thread that runs a _run function, and hanging happens here.
I tried to enforce connection before the first request by using Flask before_first_request decorator and applying the open method on the _topology attribute, and I found this operation may hang.
I have to download some stuff from my DB at startup, and I use a dedicated client for this, which I close after use. This client, which is used only before forking, never stalls in startup.
I also tried tu use this latter client in my routes instead of the client created with connect=False (which of course raises the warning about forking). Surprisingly this client seems to hang less frequently when I send a first request to my app, however it might hang too.
So, to sum up, whether I create my client with connect=True or False, my startup operations with the DB, which happen before forking, are always successfull, but the very first operation with the client after forking is likely to hang (but it is also likely to work).
So I wonder what happens (any kind of deadlock?).