[SERVER-6036] Disable cursor timeout for cursors that belong to a session Created: 07/Jun/12  Updated: 19/Dec/22  Resolved: 09/Nov/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.8

Type: Improvement Priority: Major - P3
Reporter: Greg Studer Assignee: James Wahlin
Resolution: Done Votes: 46
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-57863 CursorNotFound since we doubled the n... Closed
is depended on by PYTHON-938 aggregation cursor keepalive Closed
Documented
is documented by DOCS-13976 Investigate changes in SERVER-6036: D... Closed
Duplicate
is duplicated by SERVER-6906 Potential cursor timeout at reduce st... Closed
is duplicated by SERVER-13358 long aggregation queries get a cursor... Closed
is duplicated by SERVER-57863 CursorNotFound since we doubled the n... Closed
is duplicated by SERVER-38123 Add a "cursor touch" call that would ... Closed
is duplicated by SERVER-46918 in a sharded db, cursor timeout on a ... Closed
is duplicated by SERVER-46885 Allow "refreshing" cursors as we can ... Closed
is duplicated by SERVER-15895 Aggregation Query returning exception... Closed
Related
related to SERVER-4800 mongos cursor handling with timeouts Closed
related to SERVER-26321 Long-running aggregations can artific... Closed
related to SERVER-15042 Add noCursorTimeout option to command... Closed
related to DOCS-15181 [SERVER] Clarify that noCursorTimeout... Backlog
related to SERVER-59573 Add setParameter which can be used to... Closed
related to DRIVERS-1602 Automate session refresh for long-liv... Backlog
related to DOCS-4164 Cursor do not keep alive in a sharded... Closed
is related to SERVER-27009 Replication initial sync creates curs... Closed
is related to SERVER-8188 Configurable idle cursor timeout Closed
Backwards Compatibility: Minor Change
Backport Requested:
v4.4
Sprint: Query 2020-10-19, Query 2020-11-16
Participants:
Case:
Linked BF Score: 50

 Description   

When a cursor is opened as part of a session, its lifetime will be tied to that session. Closing or timing out of a session will kill all associated cursors. Given this, we can remove the separate cursor timeout mechanism for cursors that live as part of a session, and rely on session cleanup to handle cleanup of orphaned cursors.



 Comments   
Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: SERVER-6036 Adjust tests for cursor timeout to disable implicit session creation

(cherry picked from commit 317132d60584706c660164f74f51b81015ecdd72)
Branch: v4.4
https://github.com/mongodb/mongo/commit/85938f0ae7c81ecca90bdfd6addd2d9446cf0d3d

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-6036 Disable cursor timeout for cursors that belong to a session

(cherry picked from commit 920493f0c2b32fa43743934f6025790e7cf496e1)
Branch: v4.4
https://github.com/mongodb/mongo/commit/fc1eff70e7ce3f0dd53b884246723982fe6b6aac

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: SERVER-6036 Disable cursor timeout for cursors that belong to a session

(cherry picked from commit 25007083358b088afcc250969c1504840105ac5d)
Branch: v4.4
https://github.com/10gen/mongo-enterprise-modules/commit/249c9c864a3885f5f46d0b14209b8ec92cc5ec42

Comment by Githook User [ 09/Nov/20 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: SERVER-6036 Adjust tests for cursor timeout to disable implicit session creation
Branch: master
https://github.com/mongodb/mongo/commit/317132d60584706c660164f74f51b81015ecdd72

Comment by Githook User [ 09/Nov/20 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com'}

Message: SERVER-6036 Disable cursor timeout for cursors that belong to a session
Branch: master
https://github.com/mongodb/mongo/commit/920493f0c2b32fa43743934f6025790e7cf496e1

Comment by Githook User [ 09/Nov/20 ]

Author:

{'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}

Message: SERVER-6036 Disable cursor timeout for cursors that belong to a session
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/25007083358b088afcc250969c1504840105ac5d

Comment by James Wahlin [ 23/Jul/20 ]

rj-10gen@arsynet.com, I agree that with session timeout killing associated cursors, the need for a separate cursor timeout mechanism makes less sense. We will repurpose this ticket to remove the timeout mechanism for any cursor that is opened as part of a session.

Comment by Remi Jolin [ 15/Jun/20 ]

Following https://jira.mongodb.org/browse/SERVER-46918 and cursors timeout...  I was wondering...

Now that we have sessions and that when a session expires, cursors associated with this session also expire, why do we still need to have a specific expiration for cursors.

There is already a mechanism to refresh sessions, perhaps it should be enough if cursors had no specific expiration period?

Comment by Philip Sultanescu [ 20/Feb/20 ]

Thanks for the snippet. Your pipeline will run very fast if you add proper indexing on the collection. I usually use Studio3T to add indexes on sorting and group field.

I ended up executing my batches filtered by another field that I know it won't return more than 100.000 rows for memory efficiency.

For example:

execute a pipeline to get distinct folderIDs (I don't have more than 100.000 folders)
 
for each folderID in folderIDs {
 
  get cursor from aggregate pipeline filtered on folderID (easy to index)
 
  decode results in &files slice using .All() method (a folder doesn't contain more than 100.000 files)
 
}
 

 

Comment by Justin Knight [ 20/Feb/20 ]

Philip, I tried your suggestion and used limit and skip to split my aggregate query to work around this.  Like you say not as efficient but it means I can still use my aggregate query.

Here's a snippet of my go code in case it's useful for anyone:

// Due to this mongoDB limitation: https://jira.mongodb.org/browse/SERVER-6036 it isn't possible to set cursor keep alive for aggregation queries so we have to use skip and limit to grab a chunk at a time
	skip := 0
	limit := 10000
	aggregateOptions := options.Aggregate().SetAllowDiskUse(true)
	for processed < total {
		log.Printf("Skipping %d, fetching next %d search terms", skip, limit)
		pipeline := []bson.M{
			bson.M{"$match": query},
			bson.M{"$group": bson.M{"_id": bson.M{"language": "$language", "searchTerm": "$searchTerm"}}},
			bson.M{"$sort": bson.M{"_id.searchTerm": 1}},
			bson.M{"$skip": skip},
			bson.M{"$limit": limit},
		}
		cur, err := collection.Aggregate(ctx, pipeline, aggregateOptions)
		if err != nil {
			return err
		}
		skip += limit
		defer cur.Close(ctx)
		for cur.Next(ctx) {
...

Comment by Philip Sultanescu [ 30/Jan/20 ]

I was thinking of using limit and skip to split a long running aggregate query into multiple ones, similar to pagination. The problem is that the sorting queries will then have to be newly executed for each page which is less efficient. On the other hand I could solve the idle timeout problem.

Anyone who tried to do this too?

Comment by Anatoliy Lane [ 09/May/19 ]

Has there been any progress on this issue, or an ETA?

I can confirm this seems to be happening on a sharded cluster for large data sets.

 

Using SERVER-8188 as a SW solution seems to be a bit extreme... and for a large corporate environment is rather infeasible.
We'd like to be able to extend the timeout to an hour for 1 or 2 queries, not for the entire daemon. Is there anything impeding a fix for this issue?

Comment by Oleg Rekutin [ 28/Mar/17 ]

Jörg, the only practical solution seems to increase the cursorTimeoutMillis parameter (see SERVER-8188 for more info). The default value is 10 minutes. By running values of 60 or 80 minutes, you might find that all your aggregations complete.

The problem is that the cursor times out individually on a single server, if during the aggregation no documents are fetched for 10 minutes from that shard. This kills the entire fetch of a query or aggregation results.

Comment by Jörg Rech [ 28/Mar/17 ]

Aggregation of medium datasets (40 million) in clusters is still a problem with 3.4.2. When aggregating we get the following error after ~2 hours:

An exception occurred while aggregating data com.mongodb.CommandFailureException:

{ "serverUsed" : "xxx.xxx.xxx.xxx:27017" , "ok" : 0.0 , "errmsg" : "cursor id 51287731144 didn't exist on server." , "code" : 13127 , "codeName" : "Location13127"}

The same aggregation works for datasets with 20 million documents and takes ~60 minutes.

Is there any solution or workaround that could help? Is it just aggregation or could a map/reduce help? Is a solution planned for an upcoming release?

Btw. we also have datasets with 400 million and 3 billion documents that should be processed by this aggregation (even if it would take days or weeks) - so if someone is working on a solution please keep users like us in mind.

Comment by Roy Reznik [ 24/Jan/17 ]

The noTimeout option is not applicable here, since he presented an aggregation - that's an operation that does not support the noTimeout option.
So currently the only workaround is to set the timeout for the entire mongod? Is that a joke or are you serious?

Comment by Ramon Fernandez Marina [ 22/Sep/15 ]

robinyarbrough@carfax.com, unfortunately there are no updates on this issue. One can use the noTimeout option, or as Dan mentioned above, a workaround may be possible with SERVER-8188. We'll update this ticket when it gets considered for planning.

EDIT
I'm correcting my previous message because one can't use noTimeout for aggregation operations – apologies for the confusion.

Comment by Robin Yarbrough [ 21/Sep/15 ]

We are also experiencing this issue when running a long running aggregation query. These particular aggregation queries finished successfully until we upgraded to 3.0.4 from 2.6.7. We keep getting the following error after running for several days. "exception: getMore: cursor didn't exist on server, possible restart or timeout?" Are there any updates on this issue?

Comment by Ben McCann [ 14/Aug/15 ]

Can we allow the cursor timeout to be configured on a per-cursor basis instead of per-server basis? Queries have a no timeout option, but that's much too extreme. I just want to set it to something like 1 hour on a couple of them (not for all of them as that's quite extreme as well!)

Comment by Ian Whalen (Inactive) [ 15/May/15 ]

intentionally expanding the scope of this ticket to cover all work on cursor keepAlive.

Comment by Daniel Pasette (Inactive) [ 02/Apr/15 ]

This issue can be worked around now with SERVER-8188 which allows the cursor timeout to be configured, which is in 2.6.9 and is in the upcoming 3.0.2 server releases

Comment by Anton Kozak [ 21/Oct/14 ]

See SERVER-15042

Comment by Anton Kozak [ 20/Oct/14 ]

Hi,
We have the same issue, it looks like aggregation framework doesn't accept "noTimeout" parameter (it helps for usual long running queries).
I recommend to change issue type from "Improvement" to "Bug", I'm sure it affects a lot of users:

db.works.aggregate([
       {$match:{"udate":{$gte:"1412467200000"}}},
       {"$group":{"_id":"$udate", "count":{$sum:1}}}
],
{allowDiskUse : true},
{$out : "works_aggregate"})
.addOption(DBQuery.Option.noTimeout);

MongoDB shell version: 2.6.1
connecting to: test
assert: command failed: {
"errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or timeout?",
"code" : 13127,
"ok" : 0
} : aggregate failed
Error: command failed: {
"errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or timeout?",
"code" : 13127,
"ok" : 0
} : aggregate failed
at Error (<anonymous>)
at doassert (src/mongo/shell/assert.js:11:14)
at Function.assert.commandWorked (src/mongo/shell/assert.js:244:5)
at DBCollection.aggregate (src/mongo/shell/collection.js:1149:12)
at group_works.js:21:19
2014-10-18T11:23:47.463-0400 Error: command failed: {
"errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or timeout?",
"code" : 13127,
"ok" : 0
} : aggregate failed at src/mongo/shell/assert.js:13

Comment by Parameswaran [ 15/Oct/14 ]

I have the same issue. For now I have removed the shards. But the aggregation pipeline is taking more that 3 hours for me with 200 million records. Steven, how did you get the result in 13 minutes for 226 million? Is your machine really powerful?

Comment by Steven Castelein [ 15/Oct/14 ]

Any update on this issue? I'm running an aggregation pipeline on a bigdataset (226million records) that performed perfectly on a single mongodb instance (finished in 13 minutes , which impressed me a lot). Then I setup a cluster of 4 shards and ran the aggregate again, hoping to see some performance increase, however I got the same error as described in SERVER-13358.

Not only running the query took longer (perhaps a fault on my side) it didn't even finish! I don't understand that these problems arise, because running a long aggregation pipeline on large datasets is exactly what MongoDB is designed for?

Comment by Vincent [ 13/Sep/14 ]

I falling into this... I simply can't run a (big) aggregation query on my DB.
Is there any work around?

Generated at Thu Feb 08 03:10:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.