[SERVER-14758] killOp on foreground index build does not take effect until the end of the index build Created: 01/Aug/14  Updated: 27/Feb/15  Resolved: 18/Aug/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.6.0, 2.6.3
Fix Version/s: 2.7.5

Type: Bug Priority: Critical - P2
Reporter: Andrew Ryder (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-10550 CurOp::_killPending not always cleare... Closed
is related to DOCS-2742 Document how to cancel an index creation Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:
  1. create a database with a few million documents with some custom fields (exact numbers don't matter).
  2. start a custom foreground index build
  3. in another shell, issue a killOp against the running index build (currentOp first obviously)
  4. note the index build basically until close to the end (always gets to 94+% of the bulk sort in my testing)

repeat these steps in 2.4.10 and observe the index build aborts in around the time it takes to print the next status update to the log.

Participants:

 Description   

db.killOp on a foreground index build is accepted and marks the operation "killPending" : true but nothing actually happens. The index just carries on building. It aborts (correctly) near the end of the process before the commit step.

In 2.4.10 a killOp foreground index build usually only takes tens of seconds before it correctly stops. In 2.6.3, using the same dataset and the same commands, the killOp can take hours to have an effect.



 Comments   
Comment by Mathias Stearn [ 18/Aug/14 ]

Resolved by https://github.com/mongodb/mongo/commit/00913e47de5aced5267e44e82ac9e976bbaac089

Comment by Eric Milkie [ 11/Aug/14 ]

Replicated index builds are prohibited from being killed by the user.

Comment by Jason R. Coombs [ 09/Aug/14 ]

Hi Thomas,

Please see the referenced CS ticket where another MongoDB employee has replicated the issue using MongoDB 2.4.10. In our case, it was an index on a secondary created in the foreground as replicated from an index on the primary created in the background. Perhaps there's some aspect of the replication that's affecting the behavior.

Regards,
Jason

Comment by Thomas Rueckstiess [ 09/Aug/14 ]

Hi Jason,

We haven't been able to reproduce this in 2.4. See my comparison between 2.4 and 2.6 above. If you're seeing this in 2.4, can you provide a logfile showing the behavior? What's the exact version of 2.4 you're running?

Thanks, Thomas

Comment by Jason R. Coombs [ 09/Aug/14 ]

This issue affects MongoDB 2.4 also and violates the claim made in the documentation (http://docs.mongodb.org/manual/tutorial/manage-in-progress-indexing-operations/) that foreground index builds can be terminated.

Comment by Thomas Rueckstiess [ 01/Aug/14 ]

In 2.4.10, an index build gets interrupted immediately.

Fri Aug  1 15:55:39.812 [conn5] going to kill op: op: 24.0
Fri Aug  1 15:55:39.812 [conn5] received interrupt request for unknown op: 24  known ops:
Fri Aug  1 15:55:39.813 [conn4] User Assertion: 11601:operation was interrupted
Fri Aug  1 15:55:39.813 [conn5] runQuery called admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
Fri Aug  1 15:55:39.813 [conn5] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
Fri Aug  1 15:55:39.813 [conn5] command: { replSetGetStatus: 1.0, forShell: 1.0 }
Fri Aug  1 15:55:39.813 [conn5] command admin.$cmd command: { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn:1 keyUpdates:0  reslen:76 0ms
Fri Aug  1 15:55:40.088 [conn4] error building index: 11601 operation was interrupted
Fri Aug  1 15:55:40.088 [conn4] User Assertion: 10086:ns not found: test.mgendata.$foo_1
Fri Aug  1 15:55:40.088 [conn4] IndexDetails::kill(): couldn't drop ns test.mgendata.$foo_1
Fri Aug  1 15:55:40.089 [conn4] insert test.system.indexes keyUpdates:0 exception: operation was interrupted code:11601 locks(micros) w:7572683 7572ms

In 2.6.x (tested 2.6.0 and 2.6.3, same behavior), the index build finishes and only dies in the "bulk commit" phase.

2014-08-01T15:56:37.608-0400 [conn5] going to kill op: op: 27.0
2014-08-01T15:56:37.609-0400 [conn5] received interrupt request for unknown op: 27  known ops:
2014-08-01T15:56:37.609-0400 [conn5] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2014-08-01T15:56:37.610-0400 [conn5] command: { replSetGetStatus: 1.0, forShell: 1.0 }
2014-08-01T15:56:37.610-0400 [conn5] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0  reslen:76 0ms
2014-08-01T15:56:39.177-0400 [conn4] 		Index Build: 2996000/7462008	40%
2014-08-01T15:56:42.000-0400 [conn4] 		Index Build: 4879300/7462008	65%
2014-08-01T15:56:51.803-0400 [conn4] 		Index Build: 5991900/7462008	80%
2014-08-01T15:56:54.000-0400 [conn4] 		Index Build: 7440900/7462008	99%
2014-08-01T15:56:54.042-0400 [conn4] 	 bulk commit starting
2014-08-01T15:56:57.729-0400 [conn4] User Assertion: 11601:operation was interrupted
2014-08-01T15:56:57.739-0400 [conn4] index build failed. spec: { v: 1, key: { foo: 1.0 }, name: "foo_1", ns: "test.mgendata" } error: 11601 operation was interrupted
 

Generated at Thu Feb 08 03:35:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.