Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.0-rc2
Component/s: Aggregation Framework
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When running a long aggregation against a 3 shard cluster, with "allowDiskUse" and "$out" set, the operation eventually fails with the following error

mongos> db.transactions.aggregate(  [  <some large grouping> ,{ $out: "outputCollection"      } ], { allowDiskUse: true }  );
assert: command failed: {
        "errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or time
out?",
        "code" : 13127,
        "ok" : 0
} : aggregate failed
Error: command failed: {
        "errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or timeout?",  
        "code" : 13127,
        "ok" : 0
} : aggregate failed
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.commandWorked (src/mongo/shell/assert.js:244:5)
    at DBCollection.aggregate (src/mongo/shell/collection.js:1149:12)
    at (shell):1:17
2014-03-26T10:20:41.044+0000 Error: command failed: {
        "errmsg" : "exception: getMore: cursor didn't exist on server, possible restart or timeout?",  
        "code" : 13127,
        "ok" : 0
} : aggregate failed at src/mongo/shell/assert.js:13

The operation does not seem to fail after 10min from the shell but after a much longer time, I will try to time it.

Looking on mongos logs, the only relevant line is:

2014-03-26T10:20:40.979+0000 [conn2] command mydb.$cmd command: aggregate { aggregate: "
transactions", pipeline: [ { $mergeCursors: [ { host: "MongoDBLinux-1:27017", id: 47309755740 }
, { host: "MongoDBLinux-2:27017", id: 35067308903 }, { host: "MongoDBLinux-3:27017", id: 273888
85734 } ] }, { $group: { _id: "$$ROOT._id", Terminal: { $first: "$$ROOT.Terminal" }, count: { $
sum: "$$ROOT.count" }, $doingMerge: true } }, { $out: "outputCollection" } ], allowDiskUse: 
true, cursor: {} } keyUpdates:0 numYields:0 locks(micros) r:244 reslen:139 17850228ms

The log above happens about 5h after the operation was launched from the shell.
There are some lines mentioning "killing cursor" but they seem unrelated and happen more often.

Looking at mongod logs, there are no lines mentioning "killing cursor" nor "aggregate".

This is quite problematic since it makes it unusable for long aggregations.
I will try to disable cursor timeout in the query to see if it makes a difference.
My wild guess is that this error happens if 1 shard finishes its job more than 10min after another shard has finished, or smthing like that.

duplicates

SERVER-6036 Disable cursor timeout for cursors that belong to a session

Closed

Assignee:: Mathias Stearn
Reporter:: Antoine Girbal (Inactive)
Participants:: Antoine Girbal, Mathias Stearn
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Mar 26 2014 06:54:22 PM UTC
Updated:: Dec 10 2014 11:06:37 PM UTC
Resolved:: Mar 27 2014 08:27:00 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates