Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-799

Rare deadlock in Cursor destructor

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.8, 3.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible

      In PyPy and CPython 3.4, incompletely iterating a Cursor can cause a deadlock in MongoClient when the Cursor is garbage collected.

      Deadlock

      If a Cursor is left open and it goes out of scope, its __del__ method sends a message to close the server-side cursor. In rare cases this causes a deadlock: if sending the message requires a lock that is already held by any thread, including the current thread, then __del__ cannot proceed and the process hangs.

      In PyMongo 2.7+ and PyPy 2.4, this hang is easily reproduced:

      from pymongo import MongoClient
      
      n = 0
      
      client = MongoClient()
      collection = client.test.test
      collection.drop()
      collection.insert({} for _ in range(200))
      
      while True:
          cursor = collection.find()
          next(cursor)
          del cursor
          client.disconnect()
          n += 1
          if not n % 1000:
              print(n)
      

      On my system this always hangs before the first 1000 iterations. "del cursor" makes the cursor into garbage, but its destructor doesn't run immediately. The call to "disconnect" means that the next trip through MongoClient.__ensure_member executes a code path that allocates an object while holding the lock. There's a chance this allocation triggers PyPy's GC, which runs the cursor destructor, which requires the MongoClient's lock. Hence the deadlock.

      This bug was introduced when a lock was added to MongoClient in PYTHON-487, released in PyMongo 2.7.

      MongoReplicaSetClient has an older, simpler concurrency design, which does not take a lock in the close-cursor path.

      In PyMongo 3.0, the Topology class does many more allocations while holding a lock, so deadlocks in PyPy related to unclosed cursors are quite common. They occur once every five or ten runs of the test suite: The cursors left open by test_limit_and_batch_size often deadlock the process when they are garbage collected, dozens of tests later in the suite.

      Theoretical Deadlocks

      A rarer source of deadlocks may be in the connection pool, which can allocate a set object while holding a lock in Pool.reset. If I add a call to gc.collect in Pool.reset it deadlocks, but I can't prove a garbage collection is ever triggered in the real world in that code path, so the risk is theoretical. This applies equally to MongoClient and MongoReplicaSetClient.

      In CPython 3.4, a deadlock can theoretically result if an open cursor is referenced by cyclic garbage:

      while True:
          cursor = collection.find()
          next(cursor)
          d = {'c': cursor}
          d['d'] = d
          del cursor, d
          client.disconnect()
          n += 1
          if not n % 1000:
              print(n)
      

      This causes delayed garbage collection, as in PyPy, so the cursor's destructor can run on a thread that already holds the lock. In CPythons before 3.4 cyclic garbage that referred to a Cursor would never be freed, due to Cursor's __del__ method. In any case, I can't cause this theoretical deadlock in the real world.

      Fixes

      In PyMongo 2.8 the simplest fix is to do object allocations before taking a lock.

      In 3.0, I plan to enqueue calls to MongoClient.close_cursor so they are not called during garbage collection, but are scheduled to be called later.

            Assignee:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: