[SERVER-34053] Cursor not found error when running long query on secondary with noCursorTimeout Created: 21/Mar/18 Updated: 25/Jun/21 Resolved: 17/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.6.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Emily Stolfo | Assignee: | Misha Tyulenev |
| Resolution: | Duplicate | Votes: | 3 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Sharding 2018-05-21 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
I’ve had two Ruby tickets reporting that a user gets a cursor id not found error with the latest version of the server and the Ruby driver. I’ve been able to reproduce it (see below for code and logs). Here are the details:
I confirmed that:
I can provide the user's primary and secondary logs, as well as application logs. Here are the Ruby tickets: I was able to reproduce this with the following Ruby code: This is the driver log: |
| Comments |
| Comment by Sahi kakkar [ 25/Jun/21 ] |
|
I am using 2.11.1 version and getting same issue MongoDB.Driver.MongoCursorNotFoundException: Cursor 2433389218078643918 not found on server {Primary} using connection 78488. I am running a aggregation query to get data in consumer channel and processing async. based on my analysis i found below could be the issue. but not sure.
I would really appreciate help on this. |
| Comment by Avner Cohen [ 27/Oct/19 ] |
|
In this doc: https://docs.mongodb.com/manual/core/read-preference-mechanics/#read-preference-for-replica-sets I see: The read preference is re-evaluated for each operation. I am not sure now, how can a server side cursor work with this limitation.
Would appreciate help on that. |
| Comment by Avner Cohen [ 23/Oct/19 ] |
|
Still seeing it, even with latest 4.2. After some investigation, and crossing data with other cases on the www (excluding all the cases of real 10 minutes timeouts)I suspect the issue is related to a query with secondaryPreferred. Since cursor is server side, I am guessing that in some scenario, the query is hitting a different secondary. Is that in any way possible ? |
| Comment by Dan Dascalescu [ 08/Apr/19 ] |
|
I'm still seeing this issue on mLab, with MongoDB server version: 3.6.8. I can't post the query here publicly, but I'm happy to grant MongoDB/mLab permission to look at our query and data if that helps. Or perhaps we need to upgrade the server version? E QUERY [js] Error: getMore command failed: {E QUERY [js] Error: getMore command failed: { "ok" : 0, "errmsg" : "cursor id 97741904259 not found", "code" : 43, "codeName" : "CursorNotFound", "operationTime" : Timestamp(1554677331, 5), "$clusterTime" : { "clusterTime" : Timestamp(1554677331, 5), "signature" : { "hash" : BinData(0,"erU/NyFSNqzDCBeNyxhwSRjWXcQ="), "keyId" : NumberLong("6619795293317627907") }}} :_getErrorWithCode@src/mongo/shell/utils.js:25:13DBCommandCursor.prototype._runGetMoreCommand@src/mongo/shell/query.js:815:1DBCommandCursor.prototype._hasNextUsingCommands@src/mongo/shell/query.js:846:9DBCommandCursor.prototype.hasNext@src/mongo/shell/query.js:854:1DBQuery.prototype.hasNext@src/mongo/shell/query.js:294:13DBQuery.prototype.forEach@src/mongo/shell/query.js:500:12@(shell):1:1 |
| Comment by Misha Tyulenev [ 10/Sep/18 ] |
|
jussi@vainu.io there is another issue that has similar symptoms tracked in |
| Comment by Jussi Kauppila [ 10/Sep/18 ] |
|
Hi, We are using 4.0.1 in Atlas and are still experiencing this all the time. Longops with no_cursor_timeout set to true raise CursorNotFound error when ran with Secondary Read Preference. With Primary everything works as expected. Patch that fixes https://jira.mongodb.org/browse/SERVER-34810 should be included in the 4.0.1. Maybe there is still something going on. I can look in to this deeper at some point but currently the fix for us is to run long queries from Primary. We are using the latest pymongo 3.7.1 to connect. I think this should be reopened.
|
| Comment by Misha Tyulenev [ 02/May/18 ] |
|
Thanks david.storch! Its hard to say if its a dup without looking at the reproducible test case. The major difference is that |
| Comment by David Storch [ 02/May/18 ] |
|
misha.tyulenev, I found |
| Comment by Misha Tyulenev [ 01/May/18 ] |
|
emily.stolfo could you please clarify why the find command does not have clusterTime in it? is this the first command? https://gist.github.com/estolfo/1bc93c544e25b71678d9389a306f5292#file-no_cursor_found_driver_logs-rb-L98 |
| Comment by Misha Tyulenev [ 13/Apr/18 ] |
|
Hi pavel.duchovny could you please include server logs? Thanks! |
| Comment by Charlie Swanson [ 23/Mar/18 ] |
|
Moving this over to the Sharding backlog because we suspect the issue is caused by a session timeout. It's unclear to the query team if it's a driver issue or a server issue. |
| Comment by Emily Stolfo [ 22/Mar/18 ] |
|
Hi david.storch I patched the driver to not associate a session with the find and getmore operations and did not get the error with read preference secondaryPreferred. I've added the test log in the gist. |
| Comment by David Storch [ 21/Mar/18 ] |
|
Thanks, Emily. I currently suspect that this is related to the session getting killed. Is it possible for you to run the repro in such a way that the cursor does not belong to a logical session? If the problem no longer repros, that would help bolster my case. |
| Comment by Emily Stolfo [ 21/Mar/18 ] |
|
I am testing against version 3.6.3. I've just tested with read preference primary and I did not get the cursor not found error. I've added the driver logs to the bottom of the gist. |
| Comment by David Storch [ 21/Mar/18 ] |
|
emily.stolfo, against which versions of the server can you reproduce the problem? Does your Ruby script continue to reproduce the problem when the cursor is open on the primary instead of a secondary? |
| Comment by Alberto Ornaghi [ 21/Mar/18 ] |
|
Hi, i'm the reporter of |