Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-36571

cursor disappears before cursorTimeoutMillis (and its not a timeout)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Duplicate
    • 4.0.0, 4.0.1
    • None
    • None
    • None
    • ALL
    • Hide

      set cursorTimeoutMillis = 7200000 (2 hours) on the test db

      in python:  (tested with pymongo 3.7.1)

      open a cursor (with enough data so mongo will have to do getMore requests to get all the data); get a few records, then wait for 60mn, then try to get the remainng records from the cursor. 

      Cursor crashes: CursorNotFound

       

      You'll find attached a program to reproduce this and its log

       

      Show
      set cursorTimeoutMillis = 7200000 (2 hours) on the test db in python:  (tested with pymongo 3.7.1) open a cursor (with enough data so mongo will have to do getMore requests to get all the data); get a few records, then wait for 60mn, then try to get the remainng records from the cursor.  Cursor crashes: CursorNotFound   You'll find attached a program to reproduce this and its log  

    Description

      Context:
      We have batchs that get data in db (cursor count > 5M records) and, for each record, compute various data. This computation can be quite long and the delta time between 2 request to getMore can be much more than 10mn (cursorTimeout). So we run our database with cursorTimeoutMillis = 7200000 (2 hours) and everything run well on our current db (mongoDB 3.0.x).

      When trying MongoDb 4.0 (currently 4.0.1) we run into an issue : some programs crash with CursorNotFound (all our programs are in python).

       

      On specific tests program we can see that the cursor vanishes 30 to 40mn after opening the cursor

       

      So I tried to reproduce the problem with a small script that first get a few records from a cursor, than wait for some
      (long) time and then get the remaing records. While waiting, I'm getting sessions list and serverStatus().metrics
      .cursor

      cursor total == 1 (our cursor)

      Within 5mn, we see sessions 2 appearing (the cache seems to be updated every 5mn)
      one of these sessions (which seems to be related to the cursor as we can see in mongodb.log (verbosity=5)) has its lastUse that does not change over time (seems coherent).

      About 30mn later we have only one session left. The session related to the cursor is not there anymore. I guess its because of liidTTLIndex on config.system.sessions

      {
       "v" : 2,
       "key" : {
       "lastUse" : 1
       },
       "name" : "lsidTTLIndex",
       "ns" : "config.system.sessions",
       "expireAfterSeconds" : 1800
       }
      

      5mn later (session cache refresh ?) serverStatus.metrics.cursor has a total of 0 and if we stop waiting, our for loop
      on the cursor ends with a CursorNotFound error.

      I guess that if the index had a expireAfterSeconds greater than cursorTimeoutMillis / 1000 (to get seconds), the issue would disappear.

       

      This issue is critical for us. We cannot upgrade until it is solved

      Attachments

        1. test_cursor_expiration.log
          27 kB
        2. test_cursor_expiration.py
          2 kB

        Issue Links

          Activity

            People

              nick.brewer Nick Brewer
              rj-10gen@arsynet.com Remi Jolin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: