Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1005

Have to close cursors explicitly to get currect documents in multithreaded highload Django app

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Cannot Reproduce
    • Affects Version/s: 3.0.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu v14.04
      Mongodb v2.6.11
      PyMongo v3.0.3

      12 core CPU
      16 GB RAM
    • # Replies:
      20
    • Last comment by Customer:
      true

      Description

      Hi,

      We are using Mongodb in one of our Django applications running over uwsgi. When running it on a weak system (2 core CPU, 2 GB RAM) with high load and processing requests on 6 parallel process (2 requests each), there is nothing unusual.
      But when we are running it on a more powerful system (12 core CPU, 16 GB RAM) returned cursor will contain less results than expected. For example if we have 8000 documents in our collection matching a query, sometimes cursor contain much less or even 0 documents.
      But this will not happen as soon as we are running our code. It takes some time running and process something about 100000 request, then it starts to return wrong responses.
      Respawning uwsgin processes will help to reduce the problem. But even then it will happen some times.
      Anyway we changed our code to close any cursor we use, explicitly in code when we do not need them anymore. Then the problem is completely gone. What is the reason of this behavior at all? Since pymongo has a garbage collector this has not to be necessary to close cursors explicitly.
      And why, without closing them, pymongo uses old cursors instead of creating new one?

      Thank you all.

        Activity

        Hide
        5hahinism Shahin Azad added a comment -

        Before this I was calling a cursor generator function (which will get a query item and returns relevant find cursor) from within my Django views. And django is running on a multiprocess threaded uWsgi. Here is my uWsgi command:

        exec newrelic-admin run-program uwsgi --enable-threads \
             --single-interpreter \
             --http :$PORT \
             --wsgi afapi.wsgi \
             --processes 6 \
             --stats /tmp/afapi.socket \
             --max-requests 2000 \
             --max-worker-lifetime 1800
        

        Show
        5hahinism Shahin Azad added a comment - Before this I was calling a cursor generator function (which will get a query item and returns relevant find cursor) from within my Django views. And django is running on a multiprocess threaded uWsgi. Here is my uWsgi command: exec newrelic-admin run-program uwsgi --enable-threads \ --single-interpreter \ --http :$PORT \ --wsgi afapi.wsgi \ --processes 6 \ --stats /tmp/afapi.socket \ --max-requests 2000 \ --max-worker-lifetime 1800
        Hide
        behackett Bernie Hackett added a comment -

        it must be accessed by only one thread at a time.

        Not just one thread at a time, but one thread period. Iterating a single Cursor in multiple threads will lead to very confusing results.

        Can you show us your cursor generator? The use of a generator makes it sound like you might be yielding the same Cursor object multiple times. If so, that is definitely the problem, and not a bug in PyMongo.

        Show
        behackett Bernie Hackett added a comment - it must be accessed by only one thread at a time. Not just one thread at a time, but one thread period. Iterating a single Cursor in multiple threads will lead to very confusing results. Can you show us your cursor generator? The use of a generator makes it sound like you might be yielding the same Cursor object multiple times. If so, that is definitely the problem, and not a bug in PyMongo.
        Hide
        5hahinism Shahin Azad added a comment - - edited

        Here is the function:

        def get_items(cat=None, sort=SortMethod.DATE_ADDED, page=0, limit=settings.DEFAULT_PAGE_SIZE, extra_query=None, device=None, debug=False):
            query = ENABLED_APPS_QUERY.copy()
         
            if cat:
                query['cat'] = cat
         
            if device:
                query['dev'] = device.lower()
         
            if extra_query:
                query.update(extra_query)
         
         
            items = mydb.elems.find(query, projection=DEFAULT_FIELDS, skip=page * limit, limit=limit)
         
            if sort == SortMethod.DATE_ADDED:
                items = items.sort('upd', DESCENDING)
            elif sort == SortMethod.ALPHABETICAL:
                items = items.sort('nam', ASCENDING)
            elif sort == SortMethod.RATING:
                items = items.sort('bayesian_rate', DESCENDING)
            elif sort == SortMethod.DOWNLOADS:
                items = items.sort('dl', DESCENDING)
         
            return items
        

        And in my views, after calling this function, I'll apply list() on returned cursor to generate list to prepare a JSON response.

        Show
        5hahinism Shahin Azad added a comment - - edited Here is the function: def get_items(cat=None, sort=SortMethod.DATE_ADDED, page=0, limit=settings.DEFAULT_PAGE_SIZE, extra_query=None, device=None, debug=False): query = ENABLED_APPS_QUERY.copy()   if cat: query['cat'] = cat   if device: query['dev'] = device.lower()   if extra_query: query.update(extra_query)     items = mydb.elems.find(query, projection=DEFAULT_FIELDS, skip=page * limit, limit=limit)   if sort == SortMethod.DATE_ADDED: items = items.sort('upd', DESCENDING) elif sort == SortMethod.ALPHABETICAL: items = items.sort('nam', ASCENDING) elif sort == SortMethod.RATING: items = items.sort('bayesian_rate', DESCENDING) elif sort == SortMethod.DOWNLOADS: items = items.sort('dl', DESCENDING)   return items And in my views, after calling this function, I'll apply list() on returned cursor to generate list to prepare a JSON response.
        Hide
        behackett Bernie Hackett added a comment -

        mydb.elems.find(query, projection=DEFAULT_FIELDS, skip=page * limit, limit=limit)

        The skip and limit explains why your calls to count() seem wrong. By default Cursor.count doesn't take skip and limit into account. You have to use the with_limit_and_skip option:

        https://api.mongodb.org/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.count

        Another thing to note, skip is expensive and using it to do paging is a bad idea. There are better ways. See this stackoverflow answer from another MongoDB employee:

        http://stackoverflow.com/questions/5049992/mongodb-paging

        I think at this point we've established that there is no PyMongo bug here, so I'm going to close this ticket. Further questions about using cursors or fast paging, or anything else about application design should be asked on mongodb-user or stackoverflow.

        Show
        behackett Bernie Hackett added a comment - mydb.elems.find(query, projection=DEFAULT_FIELDS, skip=page * limit, limit=limit) The skip and limit explains why your calls to count() seem wrong. By default Cursor.count doesn't take skip and limit into account. You have to use the with_limit_and_skip option: https://api.mongodb.org/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.count Another thing to note, skip is expensive and using it to do paging is a bad idea. There are better ways. See this stackoverflow answer from another MongoDB employee: http://stackoverflow.com/questions/5049992/mongodb-paging I think at this point we've established that there is no PyMongo bug here, so I'm going to close this ticket. Further questions about using cursors or fast paging, or anything else about application design should be asked on mongodb-user or stackoverflow.
        Hide
        5hahinism Shahin Azad added a comment -

        Thank you for your helps and tips. As you said using cursors with a thread safe approach, solved all our issues.

        Show
        5hahinism Shahin Azad added a comment - Thank you for your helps and tips. As you said using cursors with a thread safe approach, solved all our issues.

          People

          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              1 year, 37 weeks, 5 days ago
              Date of 1st Reply: