[SERVER-24474] cursor next/hasnext throws MongoException at random Created: 08/Jun/16  Updated: 18/Jan/17  Resolved: 04/Oct/16

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: bob whitehurst Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-24723 Error "Cursor not found" while cursor... Closed
Related
Operating System: ALL
Steps To Reproduce:

populate a collection with somewhere around 100k entries.
do until the exception occurs
   get a cursor from a quey on the collection
   while hasNext()
       next()
   close cursor

Participants:

 Description   

Once we went to using shards, there have been several issues with cursors that occur randomly. One issue we having is a timeout condition where a MongoException is thrown. All of the information that I have found indicates that this is due to the cursor not being used within 10 minutes. However, I know this is not the case because I was able to create a test case that recreated the problem. It doesn't happen very often but it does occur.

Our application is primarily a write oriented where multiple threads extract data from external source and write the data to one or more collections. There may be 1 or 30 million entries in a collection at the end of the data gathering process. This data is then processed and filter into other collections. While processing and filtering the data, a cursor is usually obtained. Most of the time there isn't a problem but ever so often the MongoException is throw while processing the cursor. Altering the batch size for the cursor doesn't seem to change anything.

We actually have recovery code in place that catches the exception and creates a new cursor and then does a skip() to the current count. This error will occur anywhere from 10 to 40 times a day. I am not sure how many cursors are actually opened but the number of times this occurs is enough to warrant attention.



 Comments   
Comment by Kelsey Schubert [ 04/Oct/16 ]

Hi bmwmaestoso,

Thanks for the detailed follow up. Unfortunately, we have not been able to reproduce this issue. My understanding is that the workaround you have in place has minimized the impact of this issue. If you are able to determine situations in which this behavior does not occur, it may help us tailor our reproduction to hit this issue.

Thanks again,
Thomas

Comment by bob whitehurst [ 09/Jun/16 ]

I have been able to reproduce this on a number of different configurations.
3 shards
3 config servers
1 mongos

I have looked in the log files and haven't found anything that would indicate there is any kind of problem. Our environment is on a secure network and I cannot transfer anything out without it going through a big review process that could take a long time.

We are using the Java artifact org.mongodb.mongo-java-driver version 3.2.2. This problem also occurred using version 3.0.4.

It should be very easy to create a test case for this. I basically outlined this in the steps to reproduce. It should only be a few lines of code. All you need to do is populate a collection with anywhere from 60k to 100k of entries. It may take a few iterations before the MongoException is thrown, so you might want to put a while loop around this code. Other aspects that may affect the results from the find:

  • we are using a projection to filter out 3 of 7 fields in the documents
  • a sort is used
  • the batch size is set to 150

MongoCursor<Document> cursor = collection.find(...)
while (cursor.hasNext())

{ cursor.next(); }
Comment by Ramon Fernandez Marina [ 08/Jun/16 ]

bmwmaestoso, we'll need more information to determine if there's a bug in MongoDB. Please provide:

  • Detailed information about your setup (number of servers, mongos, etc.)
  • Any logs from mongos nodes you're using when you get the exception
  • The language and driver version you're using

You mention you have a reproducer – can you please share it with us? It would be of great help trying to understand what's going on.

Thanks,
Ramón.

Generated at Thu Feb 08 04:06:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.