Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.8, 3.0
Affects Version/s: None
Component/s: None
Labels:
None

Confidence Status:
None

Backwards Compatibility:
Fully Compatible

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

In PyPy and CPython 3.4, incompletely iterating a Cursor can cause a deadlock in MongoClient when the Cursor is garbage collected.

Deadlock

If a Cursor is left open and it goes out of scope, its __del__ method sends a message to close the server-side cursor. In rare cases this causes a deadlock: if sending the message requires a lock that is already held by any thread, including the current thread, then __del__ cannot proceed and the process hangs.

In PyMongo 2.7+ and PyPy 2.4, this hang is easily reproduced:

from pymongo import MongoClient

n = 0

client = MongoClient()
collection = client.test.test
collection.drop()
collection.insert({} for _ in range(200))

while True:
    cursor = collection.find()
    next(cursor)
    del cursor
    client.disconnect()
    n += 1
    if not n % 1000:
        print(n)

On my system this always hangs before the first 1000 iterations. "del cursor" makes the cursor into garbage, but its destructor doesn't run immediately. The call to "disconnect" means that the next trip through MongoClient.__ensure_member executes a code path that allocates an object while holding the lock. There's a chance this allocation triggers PyPy's GC, which runs the cursor destructor, which requires the MongoClient's lock. Hence the deadlock.

This bug was introduced when a lock was added to MongoClient in ~~PYTHON-487~~, released in PyMongo 2.7.

MongoReplicaSetClient has an older, simpler concurrency design, which does not take a lock in the close-cursor path.

In PyMongo 3.0, the Topology class does many more allocations while holding a lock, so deadlocks in PyPy related to unclosed cursors are quite common. They occur once every five or ten runs of the test suite: The cursors left open by test_limit_and_batch_size often deadlock the process when they are garbage collected, dozens of tests later in the suite.

Theoretical Deadlocks

A rarer source of deadlocks may be in the connection pool, which can allocate a set object while holding a lock in Pool.reset. If I add a call to gc.collect in Pool.reset it deadlocks, but I can't prove a garbage collection is ever triggered in the real world in that code path, so the risk is theoretical. This applies equally to MongoClient and MongoReplicaSetClient.

In CPython 3.4, a deadlock can theoretically result if an open cursor is referenced by cyclic garbage:

while True:
    cursor = collection.find()
    next(cursor)
    d = {'c': cursor}
    d['d'] = d
    del cursor, d
    client.disconnect()
    n += 1
    if not n % 1000:
        print(n)

This causes delayed garbage collection, as in PyPy, so the cursor's destructor can run on a thread that already holds the lock. In CPythons before 3.4 cyclic garbage that referred to a Cursor would never be freed, due to Cursor's __del__ method. In any case, I can't cause this theoretical deadlock in the real world.

Fixes

In PyMongo 2.8 the simplest fix is to do object allocations before taking a lock.

In 3.0, I plan to enqueue calls to MongoClient.close_cursor so they are not called during garbage collection, but are scheduled to be called later.

is related to

PYTHON-1272 Potential deadlock in an exhaust Cursor destructor

Closed

Assignee:: A. Jesse Jiryu Davis
Reporter:: A. Jesse Jiryu Davis
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Dec 12 2014 08:23:41 PM UTC
Updated:: May 04 2017 08:32:16 PM UTC
Resolved:: Jan 05 2015 08:31:55 PM UTC
Confidence Status Last Update:: 15/Dec/14 8:45 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates