-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
An application running Python 2.7.0 or older (so, not 2.7.1 and up) that creates and destroys very large numbers of threads continuously for a long time, if it calls end_request at least once and start_request more than end_request, can leave a growing number of unreclaimable sockets in Pool._tid_to_sock. These sockets will eventually exceed the process's ulimit or the server's.
Background: When a thread calls start_request it's assigned a socket, which is put into a dict called _tid_to_sock, keyed with its thread id. We use a weakref callback to a threadlocal to know if the thread dies without calling end_request; if so, we remove its socket from _tid_to_sock and return it to the general pool. See "Knowing When a Python Thread Has Died" for more details on this technique.
threadlocals have a number of charming quirks rooted in issue 1868, fixed in Python 2.7.1. We thought we'd nailed them all, but a symptom remains. It can be reproduced with the following code:
import threading import weakref nthreads = 10000 ncallbacks = 0 ncallbacks_lock = threading.Lock() local = threading.local() refs = set() class Vigil(object): pass def run(): def on_thread_died(ref): global ncallbacks ncallbacks_lock.acquire() ncallbacks += 1 ncallbacks_lock.release() local.vigil = vigil = Vigil() refs.add(weakref.ref(vigil, on_thread_died)) threads = [threading.Thread(target=run) for _ in range(nthreads)] for t in threads: t.start() for t in threads: t.join() getattr(local, 'c', None) # Trigger cleanup in 2.7.0 assert ncallbacks == nthreads, \ 'only %d callbacks run' % ncallbacks
It appears that assigning to a threadlocal isn't threadsafe in Python <= 2.7.0. If contention is high enough--that is, if there are enough threads, and if some threads are calling end_request as others call start_request--the threadlocal can become corrupted, and some objects stored in it are never cleaned up after their threads die.
Locking around the assignment fixes the problem:
local_lock = threading.Lock() # ... local_lock.acquire() local.vigil = vigil = Vigil() local_lock.release() refs.add(weakref.ref(vigil, on_thread_died))
I expect a lock around threadlocal assignment in ThreadIdent.get() will fix the problem.
Additionally, removing an unused access of ThreadIdent.get() in end_request will reduce contention.
- is related to
-
PYTHON-353 Unbounded connection growth with Apache mod_wsgi 2.x
- Closed