[SERVER-3067] Can't kill indexing operations Created: 09/May/11  Updated: 28/Oct/15  Resolved: 09/Nov/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 1.8.0
Fix Version/s: 2.3.1

Type: Bug Priority: Critical - P2
Reporter: Aaron Westendorf Assignee: Aaron Staple
Resolution: Done Votes: 12
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04.2
Linux 2.6.32-30-generic #59-Ubuntu SMP Tue Mar 1 21:30:46 UTC 2011 x86_64 GNU/Linux


Issue Links:
Depends
depends on SERVER-4227 audit killop support Closed
Related
related to SERVER-12164 Write commands are nesting operations... Closed
Operating System: ALL
Participants:

 Description   

Overview:

This ticket adds support for using killop to interrupt a client initiated foreground index build in progress. When the index build is killed, all resources related to the index are cleaned up and the index is removed from system.indexes. An error response is provided on the connection that initiated the index build. A foreground index build will similarly be killed if mongod is shut down while the build is in progress.

Index builds that are not directly initiated by an external client cannot be interrupted in this manner. For example, map reduce and reindex (as well as other commands) build indexes as part of their internal implementations and cannot be interrupted currently.

Aaron

----------------------

I accidentally started a foreground index build on 300 million records. I followed the documentation and tried to kill it, but was unsuccessful. This resulted in significant downtime.

http://www.mongodb.org/display/DOCS/Viewing+and+Terminating+Current+Operation

{
"opid" : 710799555,
"active" : true,
"lockType" : "write",
"waitingForLock" : false,
"secs_running" : 5260,
"op" : "insert",
"ns" : "production.system.indexes",
"query" : {

},
"client" : "10.122.166.155:57729",
"desc" : "conn",
"msg" : "index: (2/3) btree bottom up 192158787/298486401 64%"
}

I issued the command:
db.killOp(710799555)

The console printed out that it was trying to kill the process, but it never did. Once the master finally completed, after over an hour, we had to wait for the replica set slaves to also build the index. The overall outage window lasted several hours but could have been avoided if I could kill the job.



 Comments   
Comment by Jalmari Raippalinna [ 19/Nov/13 ]

Just happened to browse by this issue, and had to pitch in.

We killed background indexing task about month ago and this corrupted the database with invalid BSONSize error. Cannot find any logs about that anymore, but just a note that it can happen.

Collection already had existing sparse & unique indexes, and this was an additional index to the collection.

Comment by auto [ 09/Nov/12 ]

Author:

{u'date': u'2012-11-09T04:51:57Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Disable BSONObjExternalSorter's qsort callback interrupts on solaris. Our solaris build environment does not support exceptions in qsort callbacks.
Branch: master
https://github.com/mongodb/mongo/commit/bdd827db775d7f05de50908e45aca93e0b5c26b6

Comment by auto [ 09/Nov/12 ]

Author:

{u'date': u'2012-11-09T03:15:03Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Fix windows compile by specifying boost::shared_ptr.
Branch: master
https://github.com/mongodb/mongo/commit/75aef9e0f2a1c8d825f280b1210a24a4d23aabf7

Comment by auto [ 09/Nov/12 ]

Author:

{u'date': u'2012-11-09T02:51:33Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Fix rhel warnings.
Branch: master
https://github.com/mongodb/mongo/commit/f225b710cf39e462f782e14d4b517596226f161a

Comment by auto [ 09/Nov/12 ]

Author:

{u'date': u'2012-11-09T01:19:32Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Move old external sort tests from jsobjtests.cpp to extsorttests.cpp
Branch: master
https://github.com/mongodb/mongo/commit/5509b014eb5700543f0339240319f2ebac3bbbe9

Comment by auto [ 09/Nov/12 ]

Author:

{u'date': u'2012-10-25T21:07:23Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Add killop support for foreground index builds.
Branch: master
https://github.com/mongodb/mongo/commit/6a51b6b01e4ebdd723e6ad33f07934d5558f9ad7

Comment by auto [ 24/Oct/12 ]

Author:

{u'date': u'2012-10-23T18:19:29-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}

Message: SERVER-3067 Remove tab characters from the btreebuilder.cpp file.
Branch: master
https://github.com/mongodb/mongo/commit/df9bc9ca6193fd028045637360ed2dd4aff760e0

Comment by Vinaykr [ 09/Aug/12 ]

We had a similar issue where foreground index creation froze the entire cluster and starved all other read/write ops. It should not be so easy to get into this stage. The default option for index creation should be changed to "background always" because for any reasonably sized production deployment, foreground index creation never makes any sense. If it's a small sized cluster/db then even background index creation will complete quickly. So, I think the default doesnt make sense.
Till that setting is changed, at least index operations should allowed to be killed otherwise you are just screwed!
Thanks!

Comment by Colin Howe [ 18/Jul/12 ]

Hi Ian,

I'd like to stress how important this is for us. We can have all the replication under the sun, all the multi-DC writes you can shake a stick at... but it is all for nought if I can accidentally lock out an entire replication set by an accidental omission of background: true or a migration accidentally not performed in office hours...

The fact it then also replicates the index build command to replicas just makes this even worse.

This, or the suggestion of preventing foreground index builds, would go a long way to making us believe that MongoDB is a robust solution.

Thanks,
Colin

Comment by Ian Whalen (Inactive) [ 17/Jul/12 ]

Hi all, we're still in the planning stages for 2.4 and will evaluate getting this in, although I can't guarantee it will make it.

Comment by Bar Ziony [ 17/Jul/12 ]

Are there any updates from 10gen people about this?

Thanks!

Comment by bahadir cambel [ 13/May/12 ]

db version v2.0.5

I've even realized that you can not also shutdown server to kill the indexing operation.

db.shutdownServer()
assert failed : unexpected error: "shutdownServer failed: db assertion failure"
Error("Printing Stack Trace")@:0
()@shell/utils.js:35
("assert failed : unexpected error: \"shutdownServer failed: db assertion failure\"")@shell/utils.js:46
(false,"unexpected error: \"shutdownServer failed: db assertion failure\"")@shell/utils.js:54
()@shell/db.js:205
@(shell):1

Sun May 13 00:57:58 uncaught exception: assert failed : unexpected error: "shutdownServer failed: db assertion failure"

Comment by Rafael Calsaverini [ 03/May/12 ]

I've got the same problem in 1.8.2. Luckily it wasn't in a production environment, but it blocked my work for a couple hours.

I have two doubts:

1) Is this corrected in newer versions?
2) If I just kill mongod while it's creating a new index, the collection may be corrupted?

Comment by Colin Howe [ 15/Mar/12 ]

Hey, getting this in would be amazing. It's an absolute killer if someone accidentally (in code or manually) creates a foreground index and brings down an entire site without any option apart from a failover..

Comment by Zac Witte [ 01/Mar/12 ]

Any updates to this? I still have the problem in 2.0.2

Thu Mar 1 10:33:18 [initandlisten] connection accepted from 127.0.0.1:38442 #3
Thu Mar 1 10:33:19 [conn3] going to kill op: op: 89
Thu Mar 1 10:33:19 [conn3] end connection 127.0.0.1:38442
42431700/967940554 4%
43073900/967940554 4%
44016800/967940554 4%
45115100/967940554 4%
46394100/967940554 4%
47535100/967940554 4%
48772000/967940554 5%
...

Comment by Aaron Westendorf [ 16/May/11 ]

I tested background indexing and found the following behavior:

  • While indexing, I could not run queries on the collection being indexed, though I could run other queries
  • I was able to kill the indexing job
  • After killing the job, queries to the collection that I had previously cancelled showed up in the process list

From this, I think it's the case that the lock on collections which an index grabs precludes other operations from running, possibly including the kill operation.

Generated at Thu Feb 08 03:01:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.