[SERVER-9075] killOp() is not working for aggregate framework Created: 22/Mar/13  Updated: 10/Dec/14  Resolved: 12/Feb/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, MapReduce
Affects Version/s: 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Ranjith Govindan Assignee: Mathias Stearn
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux - x64 server


Operating System: Linux
Participants:

 Description   

Running an aggregate operation against a collection of 50 million documents in a nonsharded env. The operation is taking hours (> 6 ) to
finish, because of the nature of the call - requires a complete scan of indexes and possibily documents. Averege document size is about 600 bytes
and also has several indexes (20).

The problem is with the cancel. I am trying to kill the operation using killOp(). However,the aggregation is continuing without getting interrupted for several hours.
Currentop() is recognizing the request - 'killPending' :true.

Aggregation is something like.

{$group:{ _id :

{org: "$org"}

, user:{$last:\"$author\"},logintime:{$last:\"$logintime\"}, "{$group:{_id: $author,total:{$sum:\"$time\"},count: {$sum:1}}}



 Comments   
Comment by Mathias Stearn [ 13/May/13 ]

mathias@10gen.com. I'll be sure to delete the data as soon as I am done with it.

Thank you.

Comment by Ranjith Govindan [ 13/May/13 ]

Thank You. I can provide the dump of the data we have and the queries. I will upload it on ftp server and provide credentials. Can you send me the e-mail id to which i can send the details? Please note that this data may contain proprietary information, so please use it only for debugging purpose (internally).

Comment by Mathias Stearn [ 13/May/13 ]

We are still interested in fixing this, but have been unable to repro on our own. If you can find a way to repro this we would like to solve it.

Comment by Mathias Stearn [ 07/May/13 ]

I have been unable to repro this, and have manually verified that the codepath for the provided pipeline would do the required calls to check for interrupts. How long did you wait after calling killOp? Could you produce a full repro script including data-generation?

Generated at Thu Feb 08 03:19:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.