Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.5.1
Affects Version/s: None
Component/s: None
Labels:
None

Confidence Status:
None

Backwards Compatibility:
Fully Compatible

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

An application running Python 2.7.0 or older (so, not 2.7.1 and up) that creates and destroys very large numbers of threads continuously for a long time, if it calls end_request at least once and start_request more than end_request, can leave a growing number of unreclaimable sockets in Pool._tid_to_sock. These sockets will eventually exceed the process's ulimit or the server's.

Background: When a thread calls start_request it's assigned a socket, which is put into a dict called _tid_to_sock, keyed with its thread id. We use a weakref callback to a threadlocal to know if the thread dies without calling end_request; if so, we remove its socket from _tid_to_sock and return it to the general pool. See "Knowing When a Python Thread Has Died" for more details on this technique.

threadlocals have a number of charming quirks rooted in issue 1868, fixed in Python 2.7.1. We thought we'd nailed them all, but a symptom remains. It can be reproduced with the following code:

import threading
import weakref

nthreads = 10000
ncallbacks = 0
ncallbacks_lock = threading.Lock()
local = threading.local()
refs = set()

class Vigil(object):
    pass

def run():
    def on_thread_died(ref):
        global ncallbacks
        ncallbacks_lock.acquire()
        ncallbacks += 1
        ncallbacks_lock.release()

    local.vigil = vigil = Vigil()
    refs.add(weakref.ref(vigil, on_thread_died))

threads = [threading.Thread(target=run)
           for _ in range(nthreads)]
for t in threads: t.start()
for t in threads: t.join()
getattr(local, 'c', None)  # Trigger cleanup in 2.7.0
assert ncallbacks == nthreads, \
    'only %d callbacks run' % ncallbacks

It appears that assigning to a threadlocal isn't threadsafe in Python <= 2.7.0. If contention is high enough--that is, if there are enough threads, and if some threads are calling end_request as others call start_request--the threadlocal can become corrupted, and some objects stored in it are never cleaned up after their threads die.

Locking around the assignment fixes the problem:

local_lock = threading.Lock()
# ...
    local_lock.acquire()
    local.vigil = vigil = Vigil()
    local_lock.release()
    refs.add(weakref.ref(vigil, on_thread_died))

I expect a lock around threadlocal assignment in ThreadIdent.get() will fix the problem.

Additionally, removing an unused access of ThreadIdent.get() in end_request will reduce contention.

is related to

PYTHON-353 Unbounded connection growth with Apache mod_wsgi 2.x

Closed

Assignee:: A. Jesse Jiryu Davis
Reporter:: A. Jesse Jiryu Davis
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 24 2013 07:29:55 PM UTC
Updated:: May 13 2013 09:30:37 PM UTC
Resolved:: Apr 25 2013 06:48:21 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates