|
Hey dandv,
Is the behavior of the cursor timeout documented in more detail somewhere?
Unfortunately the best docs I can find quickly are for the noCursorTimeout property documented in the find command.
I'll try to bulk answer the rest of these questions with the following: A "cursor" only truly exists on the server side. On the client side, this involve sending a command which returns a non-zero "cursor id", which the client may use to issue subsequent getMore commands. Each time a server receives a getMore from a client for a particular id it will reset the timer on that cursor. What I suspect happened in your case here is that you are taking more than 10min to do something between calls to next (e.g. getMore), so the server is closing your cursor.
|
|
Thanks for the comprehensive answer, Matt! Is the behavior of the cursor timeout documented in more detail somewhere?
there's a possibility that you get stuck doing one of these operations long enough that the next time your while loop is iterated the cursor has been closed remotely
Does the above mean that one particular iteration of the while loop would have to take over 10 minutes? Or that more than 10 minutes need to pass since the cursor would be initialized? In order words,
- does calling next() reset the cursor timeout?
- or is the cursor timeout reset only when the driver fetches new data from the server, i.e. after all documents in the batch have been requested by the client?
|
|
Hey dandv,
It sounds to me like you are indeed running into the default 10min cursor timeout on the server.
This is all very dependent mostly on two factors: how big the documents are in your batches, and how much time you are spending processing between each call to next. As you increase your batch size, so will you increase the time to receive those batches as you will be bottlenecked by I/O here up to 16MB. Secondly, in your code above I can see sections like Process users as well as a bulk operation. I can't speak for whatever happens in the processing step (you will have to benchmark this), but you are also creating a bulk operation and awaiting the result of that too. Since you are doing all of this serially, there's a possibility that you get stuck doing one of these operations long enough that the next time your while loop is iterated the cursor has been closed remotely.
There are a number of ways you could modify your code to get around this:
- Don't use async/await
As explained above, using async/await will execute these async operations serially by waiting for the result of the promise before moving forward. If you used promises or callbacks directly, you could do this work more asynchronously and would likely not run into this issue
- Disable cursor timeout when calling find (by specifying timeout: false)
This will disable the cursor time out on the server for this cursor. It seems like this is a pretty known workload, so its safe to disable the timeout (too many cursors with no timeout could run the risk of overloading the server though)
- Modify the cursor timeout on the server
We allow you to configure the default cursor timeout on the server, learn more about that here
- Use streams
Streams are somewhat well suited to this consumer/producer task. Essentially you would pipe the cursor into some processing stream, which would eventually create and execute bulk operations. This would ensure that you are receiving as many documents as the local server can handle at any time, and buffering up bulk operations to be executed in order. The objective benefit gained with this approach is the same as using promises/callbacks, as you would not be serially iterating through incoming documents
Hope that helps!
|
|
I'm seeing the same problem (MongoError: cursor id 40280262609 not found) with the v3.2.5 NodeJS driver against a standalone MongoDB 4.0.9 instance, when a cursor.next() loop runs for 16 minutes (there's also a 6-minute timeout in the middle of the run. Maybe the 10-minute is a giveaway?
Here's the code:
const cursor = await users.find(
|
{ },
|
{
|
projection: { ... },
|
}
|
).batchSize(100 * 100);
|
|
let user = await cursor.next();
|
while (user) {
|
const usersInDb = { };const { nModified } =
|
// Get up to PAGE_SIZE users (<PAGE_SIZE at the end of the cursor)
|
for (let i = 0; i < PAGE_SIZE && user; i++, user = await cursor.next())
|
usersInDb[user._id] = user;
|
|
// Process users...
|
// ...
|
const bulk = users.initializeUnorderedBulkOp();
|
// ...
|
bulk.find({ _id: id }).updateOne(userUpdateForDb);
|
|
// Update users
|
try {
|
await bulk.execute();
|
} catch (error) {
|
...
|
}
|
}
|
The collection has ~65,000 documents, and I've reduced the batch size to 10,000 per this SO question in order to keep the cursor alive.
Also posted on GitHub at https://github.com/mongodb-js/mongodb-core/issues/199.
|
|
Pull request submitted: https://github.com/go-mgo/mgo/pull/295
|
|
The Go version is sending
"query" : {
|
"find" : "objects",
|
"filter" : {
|
"_key" : "users:joindate"
|
},
|
"sort" : {
|
"score" : 1,
|
"value" : 1
|
},
|
"projection" : {
|
"score" : 1,
|
"_id" : 0,
|
"value" : 1
|
},
|
"skip" : 0
|
},
|
whereas the mongo shell is sending
"query" : {
|
"find" : "objects",
|
"filter" : {
|
"_key" : "users:joindate"
|
},
|
"sort" : {
|
"score" : 1,
|
"value" : 1
|
},
|
"projection" : {
|
"_id" : 0,
|
"value" : 1,
|
"score" : 1
|
},
|
"noCursorTimeout" : true
|
},
|
It looks like the field is set here: https://github.com/mongodb/mongo/blob/r3.2.7/src/mongo/shell/query.js#L225-L227
I think mongod is looking at the bson property instead of the wire format flag.
|
|
I updated to v3.2.7 and it fails in the same place with Cursor not found, cursor id: 28436833266 right after user 95923, just like before.
> db.serverBuildInfo()
|
{
|
"version" : "3.2.7",
|
"gitVersion" : "4249c1d2b5999ebbf1fdf3bc0e0e3b3ff5c0aaf2",
|
"modules" : [ ],
|
"allocator" : "tcmalloc",
|
"javascriptEngine" : "mozjs",
|
"sysInfo" : "deprecated",
|
"versionArray" : [
|
3,
|
2,
|
7,
|
0
|
],
|
"openssl" : {
|
"running" : "OpenSSL 1.0.1e 11 Feb 2013",
|
"compiled" : "OpenSSL 1.0.1e 11 Feb 2013"
|
},
|
"buildEnvironment" : {
|
"distmod" : "debian71",
|
"distarch" : "x86_64",
|
"cc" : "/opt/mongodbtoolchain/bin/gcc: gcc (GCC) 4.8.2",
|
"ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp",
|
"cxx" : "/opt/mongodbtoolchain/bin/g++: g++ (GCC) 4.8.2",
|
"cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -Wno-maybe-uninitialized -std=c++11",
|
"linkflags" : "-fPIC -pthread -Wl,-z,now -rdynamic -fuse-ld=gold -Wl,-z,noexecstack -Wl,--warn-execstack",
|
"target_arch" : "x86_64",
|
"target_os" : "linux"
|
},
|
"bits" : 64,
|
"debug" : false,
|
"maxBsonObjectSize" : 16777216,
|
"storageEngines" : [
|
"devnull",
|
"ephemeralForTest",
|
"mmapv1",
|
"wiredTiger"
|
],
|
"ok" : 1
|
}
|
The storage engine in use is mmapv1, if that makes a difference.
|
|
I wonder if you're running into SERVER-24058, which was fixed in 3.2.7. Would it be possible for you to test against 3.2.7?
|
|
I modified my code to sleep for 10 minutes before handling the result of the first find query. It looks like the noTimeout flag isn't being set.
> db.serverStatus().metrics.cursor
|
{
|
"timedOut" : NumberLong(0),
|
"open" : {
|
"noTimeout" : NumberLong(0),
|
"pinned" : NumberLong(0),
|
"total" : NumberLong(1)
|
}
|
}
|
Test program:
package main
|
|
import (
|
"log"
|
"net"
|
"os/exec"
|
|
"gopkg.in/mgo.v2"
|
"gopkg.in/mgo.v2/bson"
|
)
|
|
func main() {
|
ip, err := exec.Command("docker", "inspect", "-f", "{{.NetworkSettings.Networks.wtdwtf.IPAddress}}", "wtdwtf-mongo").Output()
|
if err != nil {
|
log.Fatal(err)
|
}
|
|
info, err := mgo.ParseURL(string(ip[:len(ip)-1]))
|
if err != nil {
|
log.Fatal(err)
|
}
|
|
info.DialServer = func(addr *mgo.ServerAddr) (net.Conn, error) {
|
c, err := net.DialTCP("tcp", nil, addr.TCPAddr())
|
if err != nil {
|
return nil, err
|
}
|
|
return &conn{c}, nil
|
}
|
|
mongo, err := mgo.DialWithInfo(info)
|
if err != nil {
|
log.Fatal(err)
|
}
|
defer mongo.Close()
|
|
mongo.SetCursorTimeout(0)
|
|
log.Println("before Find")
|
|
it := mongo.DB("0").C("objects").
|
Find(bson.M{"_key": "users:joindate"}).
|
Sort("score", "value").
|
Select(bson.M{"_id": 0, "value": 1, "score": 1}).
|
Iter()
|
|
log.Println("before Next")
|
|
var data bson.M
|
it.Next(&data)
|
|
log.Println("before Close")
|
if err := it.Close(); err != nil {
|
log.Fatal(err)
|
}
|
|
log.Println("Done")
|
}
|
|
type conn struct {
|
*net.TCPConn
|
}
|
|
func (c *conn) Write(b []byte) (n int, err error) {
|
n, err = c.TCPConn.Write(b)
|
log.Printf("WRITE: %x", b[:n])
|
return
|
}
|
|
func (c *conn) Read(b []byte) (n int, err error) {
|
n, err = c.TCPConn.Read(b)
|
log.Printf("READ: %x", b[:n])
|
return
|
}
|
Output:
2016/07/05 17:18:21 WRITE: 3a0000000100000000000000d40700000000000061646d696e2e24636d640000000000ffffffff13000000106765746e6f6e6365000100000000
|
2016/07/05 17:18:21 READ: 510000006f00000001000000010000000800000000000000000000000000000001000000
|
2016/07/05 17:18:21 WRITE: 3a0000000300000000000000d40700000400000061646d696e2e24636d640000000000ffffffff130000001069736d6173746572000100000000
|
2016/07/05 17:18:21 READ: 2d000000
|
2016/07/05 17:18:21 READ: 026e6f6e636500110000003630333834393066303137643532313600016f6b00000000000000f03f00
|
2016/07/05 17:18:21 READ: c20000007000000003000000010000000800000000000000000000000000000001000000
|
2016/07/05 17:18:21 READ: 9e000000
|
2016/07/05 17:18:21 READ: 0869736d61737465720001106d617842736f6e4f626a65637453697a650000000001106d61784d65737361676553697a65427974657300006cdc02106d61785772697465426174636853697a6500e8030000096c6f63616c54696d6500ad0124bd55010000106d61785769726556657273696f6e0004000000106d696e5769726556657273696f6e0000000000016f6b00000000000000f03f00
|
2016/07/05 17:18:21 WRITE: 360000000500000000000000d40700000400000061646d696e2e24636d640000000000ffffffff0f0000001070696e67000100000000
|
2016/07/05 17:18:21 READ: 350000007100000005000000010000000800000000000000000000000000000001000000
|
2016/07/05 17:18:21 READ: 11000000
|
2016/07/05 17:18:21 READ: 016f6b00000000000000f03f00
|
2016/07/05 17:18:21 before Find
|
2016/07/05 17:18:21 WRITE: df0000000700000000000000d407000010000000302e24636d640000000000ffffffff980000000266696e6400080000006f626a65637473000366696c746572001e000000025f6b6579000f00000075736572733a6a6f696e64617465000003736f7274001b0000001073636f726500010000001076616c75650001000000000370726f6a656374696f6e0024000000105f696400000000001076616c756500010000001073636f726500010000000010736b6970000000000000240000001076616c756500010000001073636f72650001000000105f6964000000000000
|
2016/07/05 17:18:21 before Next
|
2016/07/05 17:18:22 READ: 710f00007200000007000000010000000800000000000000000000000000000001000000
|
2016/07/05 17:18:22 READ: 4d0f0000
|
2016/07/05 17:18:22 READ: 127761697465644d5300000000000000000003637572736f7200220f0000046669727374426174636800f30e0000033000210000000276616c7565000200000031000173636f7265000060848d3535754200033100210000000276616c7565000200000032000173636f72650000600ea14535754200033200210000000276616c7565000200000033000173636f726500001010a14535754200033300210000000276616c7565000200000034000173636f72650000c011a14535754200033400210000000276616c7565000200000035000173636f726500008013a14535754200033500210000000276616c7565000200000036000173636f72650000f014a14535754200033600210000000276616c7565000200000037000173636f726500000016a14535754200033700210000000276616c7565000200000038000173636f726500006017a14535754200033800210000000276616c7565000200000039000173636f72650000c019a14535754200033900220000000276616c756500030000003130000173636f72650000301ba1453575420003313000220000000276616c756500030000003131000173636f72650000001ca1453575420003313100220000000276616c756500030000003132000173636f72650000201da1453575420003313200220000000276616c756500030000003133000173636f72650000301ea1453575420003313300220000000276616c756500030000003134000173636f72650000601fa1453575420003313400220000000276616c756500030000003135000173636f726500000022a1453575420003313500220000000276616c756500030000003136000173636f726500000023a1453575420003313600220000000276616c756500030000003137000173636f726500004025a1453575420003313700220000000276616c756500030000003138000173636f726500005026a1453575420003313800220000000276616c756500030000003139000173636f726500004027a1453575420003313900220000000276616c756500030000003230000173636f726500003028a1453575420003323000220000000276616c756500030000003231000173636f72650000802aa1453575420003323100220000000276616c756500030000003232000173636f72650000a02ba1453575420003323200220000000276616c756500030000003233000173636f72650000b02ca1453575420003323300220000000276616c756500030000003234000173636f72650000d02da1453575420003323400220000000276616c756500030000003235000173636f72650000c02ea1453575420003323500220000000276616c756500030000003236000173636f726500000030a1453575420003323600220000000276616c756500030000003237000173636f726500000031a1453575420003323700220000000276616c756500030000003238000173636f726500000032a1453575420003323800220000000276616c756500030000003239000173636f72650000c033a1453575420003323900220000000276616c756500030000003330000173636f72650000c034a1453575420003333000220000000276616c756500030000003331000173636f72650000b035a1453575420003333100220000000276616c756500030000003332000173636f72650000b036a1453575420003333200220000000276616c756500030000003333000173636f72650000b037a1453575420003333300220000000276616c756500030000003334000173636f72650000a038a1453575420003333400220000000276616c756500030000003335000173636f72650000a039a1453575420003333500220000000276616c756500030000003336000173636f72650000a03aa1453575420003333600220000000276616c756500030000003337000173636f72650000903ba1453575420003333700220000000276616c756500030000003338000173636f72650000803ca1453575420003333800220000000276616c756500030000003339000173636f72650000b03da1453575420003333900220000000276616c756500030000003430000173636f72650000b03fa1453575420003343000220000000276616c756500030000003431000173636f72650000a040a1453575420003343100220000000276616c756500030000003432000173636f726500009041a1453575420003343200220000000276616c756500030000003433000173636f726500008042a1453575420003343300220000000276616c756500030000003434000173636f726500008043a1453575420003343400220000000276616c756500030000003435000173636f726500008044a1453575420003343500220000000276616c756500030000003436000173636f72650000a045a1453575420003343600220000000276616c756500030000003437000173636f72650000a046a1453575420003343700220000000276616c756500030000003438000173636f72650000a047a1453575420003343800220000000276616c756500030000003439000173636f726500009048a1453575420003343900220000000276616c756500030000003530000173636f726500009049a1453575420003353000220000000276616c756500030000003531000173636f72650000804aa1453575420003353100220000000276616c756500030000003532000173636f72650000b04ba1453575420003353200220000000276616c756500030000003533000173636f72650000a04ca1453575420003353300220000000276616c756500030000003534000173636f72650000b04da1453575420003353400220000000276616c756500030000003535000173636f72650000b04ea1453575420003353500220000000276616c756500030000003536000173636f72650000b04fa1453575420003353600220000000276616c756500030000003537000173636f72650000b050a1453575420003353700220000000276616c756500030000003538000173636f72650000b051a1453575420003353800220000000276616c756500030000003539000173636f72650000b052a1453575420003353900220000000276616c756500030000003630000173636f726500009054a1453575420003363000220000000276616c756500030000003631000173636f726500009055a1453575420003363100220000000276616c756500030000003632000173636f726500008056a1453575420003363200220000000276616c756500030000003633000173636f726500008057a1453575420003363300220000000276616c756500030000003634000173636f726500008058a1453575420003363400220000000276616c756500030000003635000173636f72650000505aa1453575420003363500220000000276616c756500030000003636000173636f72650000405ba1453575420003363600220000000276616c756500030000003637000173636f72650000405ca1453575420003363700220000000276616c756500030000003638000173636f72650000305da1453575420003363800220000000276616c756500030000003639000173636f72650000205ea1453575420003363900220000000276616c756500030000003730000173636f72650000105fa1453575420003373000220000000276616c756500030000003731000173636f726500001060a1453575420003373100220000000276616c756500030000003732000173636f726500001061a1453575420003373200220000000276616c756500030000003733000173636f726500002062a1453575420003373300220000000276616c756500030000003734000173636f726500002063a1453575420003373400220000000276616c756500030000003735000173636f726500001064a1453575420003373500220000000276616c756500030000003736000173636f726500000065a1453575420003373600220000000276616c756500030000003737000173636f726500000066a1453575420003373700220000000276616c756500030000003738000173636f72650000f066a1453575420003373800220000000276616c756500030000003739000173636f726500005068a1453575420003373900220000000276616c756500030000003830000173636f726500009069a1453575420003383000220000000276616c756500030000003831000173636f72650000906aa1453575420003383100220000000276616c756500030000003832000173636f72650000806ba1453575420003383200220000000276616c756500030000003833000173636f72650000806ca1453575420003383300220000000276616c756500030000003834000173636f72650000906da1453575420003383400220000000276616c756500030000003835000173636f72650000806ea1453575420003383500220000000276616c756500030000003836000173636f72650000706fa1453575420003383600220000000276616c756500030000003837000173636f726500005070a1453575420003383700220000000276616c756500030000003838000173636f726500004071a1453575420003383800220000000276616c756500030000003839000173636f726500003072a1453575420003383900220000000276616c756500030000003930000173636f726500005073a1453575420003393000220000000276616c756500030000003931000173636f726500004074a1453575420003393100220000000276616c756500030000003932000173636f726500003075a1453575420003393200220000000276616c756500030000003933000173636f726500001076a1453575420003393300220000000276616c756500030000003934000173636f726500000077a1453575420003393400220000000276616c756500030000003935000173636f72650000e077a1453575420003393500220000000276616c756500030000003936000173636f72650000d078a1453575420003393600220000000276616c756500030000003937000173636f72650000d079a1453575420003393700220000000276616c756500030000003938000173636f72650000c07aa1453575420003393800220000000276616c756500030000003939000173636f72650000b07ba1453575420003393900230000000276616c75650004000000313030000173636f72650000907da145357542000331303000230000000276616c75650004000000313031000173636f72650000807ea14535754200001269640023fa388006000000026e73000a000000302e6f626a656374730000016f6b00000000000000f03f00
|
2016/07/05 17:18:22 before Close
|
2016/07/05 17:18:22 WRITE: 200000000000000000000000d7070000000000000100000023fa388006000000
|
2016/07/05 17:18:22 Done
|
That WRITE between before Find and before Next:
00000000: df00 0000 0700 0000 0000 0000 d407 0000 ................
|
00000010: 1000 0000 302e 2463 6d64 0000 0000 00ff ....0.$cmd......
|
00000020: ffff ff98 0000 0002 6669 6e64 0008 0000 ........find....
|
00000030: 006f 626a 6563 7473 0003 6669 6c74 6572 .objects..filter
|
00000040: 001e 0000 0002 5f6b 6579 000f 0000 0075 ......_key.....u
|
00000050: 7365 7273 3a6a 6f69 6e64 6174 6500 0003 sers:joindate...
|
00000060: 736f 7274 001b 0000 0010 7363 6f72 6500 sort......score.
|
00000070: 0100 0000 1076 616c 7565 0001 0000 0000 .....value......
|
00000080: 0370 726f 6a65 6374 696f 6e00 2400 0000 .projection.$...
|
00000090: 105f 6964 0000 0000 0010 7661 6c75 6500 ._id......value.
|
000000a0: 0100 0000 1073 636f 7265 0001 0000 0000 .....score......
|
000000b0: 1073 6b69 7000 0000 0000 0024 0000 0010 .skip......$....
|
000000c0: 7661 6c75 6500 0100 0000 1073 636f 7265 value......score
|
000000d0: 0001 0000 0010 5f69 6400 0000 0000 00 ......_id......
|
Which decodes to (decoded by hand using https://docs.mongodb.com/manual/reference/mongodb-wire-protocol/):
- message length: 223 bytes
- request ID: 7
- response to: 0
- request type: OP_QUERY
- flags: 0x00000010 NoCursorTimeout
- full collection name: 0.$cmd
- number to skip: 0
- number to return: -1
-
{
|
"find": "objects",
|
"filter": {
|
"_key": "users:joindate"
|
},
|
"sort": {
|
"score": 1,
|
"value": 1
|
},
|
"projection": {
|
"_id": 0,
|
"value": 1,
|
"score": 1
|
},
|
"skip": 0
|
}
|
-
{
|
"value": 1,
|
"score": 1,
|
"_id": 0
|
}
|
The cursor is not counted in the noTimeout field of db.serverStatus().metrics.cursor.open, but it is counted in the total field.
|
|
Reposted here: http://stackoverflow.com/questions/38211363/mongodb-cursor-not-found-even-with-no-timeout
|
|
BenLubar, this question would be best posted in the mongodb-user group or Stack Overflow with the mongodb tag, where it will reach a larger audience. If you think you've found a but in the SERVER we'll need a way to reproduce it locally, preferably in the shell to make sure that this is not a driver-related issue either. See also our Technical Support page if you need additional support resources.
Regards,
Ramón.
|
Generated at Thu Feb 08 04:07:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.