[SERVER-6036] Disable cursor timeout for cursors that belong to a session Created: 07/Jun/12 Updated: 19/Dec/22 Resolved: 09/Nov/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0, 4.4.8 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | James Wahlin |
| Resolution: | Done | Votes: | 46 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2020-10-19, Query 2020-11-16 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 50 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
When a cursor is opened as part of a session, its lifetime will be tied to that session. Closing or timing out of a session will kill all associated cursors. Given this, we can remove the separate cursor timeout mechanism for cursors that live as part of a session, and rely on session cleanup to handle cleanup of orphaned cursors. |
| Comments |
| Comment by Githook User [ 26/Jul/21 ] | |||||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: (cherry picked from commit 317132d60584706c660164f74f51b81015ecdd72) | |||||||||||||||||||||
| Comment by Githook User [ 26/Jul/21 ] | |||||||||||||||||||||
|
Author: {'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}Message: (cherry picked from commit 920493f0c2b32fa43743934f6025790e7cf496e1) | |||||||||||||||||||||
| Comment by Githook User [ 26/Jul/21 ] | |||||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: (cherry picked from commit 25007083358b088afcc250969c1504840105ac5d) | |||||||||||||||||||||
| Comment by Githook User [ 09/Nov/20 ] | |||||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: | |||||||||||||||||||||
| Comment by Githook User [ 09/Nov/20 ] | |||||||||||||||||||||
|
Author: {'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com'}Message: | |||||||||||||||||||||
| Comment by Githook User [ 09/Nov/20 ] | |||||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: | |||||||||||||||||||||
| Comment by James Wahlin [ 23/Jul/20 ] | |||||||||||||||||||||
|
rj-10gen@arsynet.com, I agree that with session timeout killing associated cursors, the need for a separate cursor timeout mechanism makes less sense. We will repurpose this ticket to remove the timeout mechanism for any cursor that is opened as part of a session. | |||||||||||||||||||||
| Comment by Remi Jolin [ 15/Jun/20 ] | |||||||||||||||||||||
|
Following https://jira.mongodb.org/browse/SERVER-46918 and cursors timeout... I was wondering... Now that we have sessions and that when a session expires, cursors associated with this session also expire, why do we still need to have a specific expiration for cursors. There is already a mechanism to refresh sessions, perhaps it should be enough if cursors had no specific expiration period? | |||||||||||||||||||||
| Comment by Philip Sultanescu [ 20/Feb/20 ] | |||||||||||||||||||||
|
Thanks for the snippet. Your pipeline will run very fast if you add proper indexing on the collection. I usually use Studio3T to add indexes on sorting and group field. I ended up executing my batches filtered by another field that I know it won't return more than 100.000 rows for memory efficiency. For example:
| |||||||||||||||||||||
| Comment by Justin Knight [ 20/Feb/20 ] | |||||||||||||||||||||
|
Philip, I tried your suggestion and used limit and skip to split my aggregate query to work around this. Like you say not as efficient but it means I can still use my aggregate query. Here's a snippet of my go code in case it's useful for anyone:
| |||||||||||||||||||||
| Comment by Philip Sultanescu [ 30/Jan/20 ] | |||||||||||||||||||||
|
I was thinking of using limit and skip to split a long running aggregate query into multiple ones, similar to pagination. The problem is that the sorting queries will then have to be newly executed for each page which is less efficient. On the other hand I could solve the idle timeout problem. Anyone who tried to do this too? | |||||||||||||||||||||
| Comment by Anatoliy Lane [ 09/May/19 ] | |||||||||||||||||||||
|
Has there been any progress on this issue, or an ETA? I can confirm this seems to be happening on a sharded cluster for large data sets.
Using | |||||||||||||||||||||
| Comment by Oleg Rekutin [ 28/Mar/17 ] | |||||||||||||||||||||
|
Jörg, the only practical solution seems to increase the cursorTimeoutMillis parameter (see The problem is that the cursor times out individually on a single server, if during the aggregation no documents are fetched for 10 minutes from that shard. This kills the entire fetch of a query or aggregation results. | |||||||||||||||||||||
| Comment by Jörg Rech [ 28/Mar/17 ] | |||||||||||||||||||||
|
Aggregation of medium datasets (40 million) in clusters is still a problem with 3.4.2. When aggregating we get the following error after ~2 hours: An exception occurred while aggregating data com.mongodb.CommandFailureException: { "serverUsed" : "xxx.xxx.xxx.xxx:27017" , "ok" : 0.0 , "errmsg" : "cursor id 51287731144 didn't exist on server." , "code" : 13127 , "codeName" : "Location13127"}The same aggregation works for datasets with 20 million documents and takes ~60 minutes. Is there any solution or workaround that could help? Is it just aggregation or could a map/reduce help? Is a solution planned for an upcoming release? Btw. we also have datasets with 400 million and 3 billion documents that should be processed by this aggregation (even if it would take days or weeks) - so if someone is working on a solution please keep users like us in mind. | |||||||||||||||||||||
| Comment by Roy Reznik [ 24/Jan/17 ] | |||||||||||||||||||||
|
The noTimeout option is not applicable here, since he presented an aggregation - that's an operation that does not support the noTimeout option. | |||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 22/Sep/15 ] | |||||||||||||||||||||
|
robinyarbrough@carfax.com, unfortunately there are no updates on this issue. EDIT | |||||||||||||||||||||
| Comment by Robin Yarbrough [ 21/Sep/15 ] | |||||||||||||||||||||
|
We are also experiencing this issue when running a long running aggregation query. These particular aggregation queries finished successfully until we upgraded to 3.0.4 from 2.6.7. We keep getting the following error after running for several days. "exception: getMore: cursor didn't exist on server, possible restart or timeout?" Are there any updates on this issue? | |||||||||||||||||||||
| Comment by Ben McCann [ 14/Aug/15 ] | |||||||||||||||||||||
|
Can we allow the cursor timeout to be configured on a per-cursor basis instead of per-server basis? Queries have a no timeout option, but that's much too extreme. I just want to set it to something like 1 hour on a couple of them (not for all of them as that's quite extreme as well!) | |||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 15/May/15 ] | |||||||||||||||||||||
|
intentionally expanding the scope of this ticket to cover all work on cursor keepAlive. | |||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 02/Apr/15 ] | |||||||||||||||||||||
|
This issue can be worked around now with | |||||||||||||||||||||
| Comment by Anton Kozak [ 21/Oct/14 ] | |||||||||||||||||||||
|
See | |||||||||||||||||||||
| Comment by Anton Kozak [ 20/Oct/14 ] | |||||||||||||||||||||
|
Hi,
MongoDB shell version: 2.6.1 | |||||||||||||||||||||
| Comment by Parameswaran [ 15/Oct/14 ] | |||||||||||||||||||||
|
I have the same issue. For now I have removed the shards. But the aggregation pipeline is taking more that 3 hours for me with 200 million records. Steven, how did you get the result in 13 minutes for 226 million? Is your machine really powerful? | |||||||||||||||||||||
| Comment by Steven Castelein [ 15/Oct/14 ] | |||||||||||||||||||||
|
Any update on this issue? I'm running an aggregation pipeline on a bigdataset (226million records) that performed perfectly on a single mongodb instance (finished in 13 minutes , which impressed me a lot). Then I setup a cluster of 4 shards and ran the aggregate again, hoping to see some performance increase, however I got the same error as described in Not only running the query took longer (perhaps a fault on my side) it didn't even finish! I don't understand that these problems arise, because running a long aggregation pipeline on large datasets is exactly what MongoDB is designed for? | |||||||||||||||||||||
| Comment by Vincent [ 13/Sep/14 ] | |||||||||||||||||||||
|
I falling into this... I simply can't run a (big) aggregation query on my DB. |