[SERVER-14505] cannot dropAllIndexes when index builds in progress assertion failure Created: 08/Jul/14  Updated: 11/Mar/15  Resolved: 14/Jul/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.6.4, 2.7.4

Type: Bug Priority: Major - P3
Reporter: Ramon Fernandez Marina Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-14494 Dropping collection during active bac... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of Jul 18, 2014

ISSUE SUMMARY
When a background index build is interrupted on a secondary, for example by dropping the collection during the build, the clean-up process can in some circumstances fail with an assertion and not clean up the incomplete index correctly. Further attempts to drop the collection fail because the index is still marked "in progress" and not cleaned up correctly.

See also related issue SERVER-14494.

USER IMPACT
Only background index builds on secondaries are affected.

WORKAROUNDS
Use foreground index builds, or ensure that the secondaries have finished building the index before dropping a collection.

AFFECTED VERSIONS
Production release versions from 2.6.0 to 2.6.3 are affected by this bug.

FIX VERSION
The fix is included in the 2.6.4 production release.

RESOLUTION DETAILS
The patch now ensures that IndexBuildBlock::fail() can never throw an exception.

Original description

Got the assertion failure below while trying to reproduce SERVER-14494 (dropping a collection on the primary while a secondary is in the process of building a background index).

2014-07-08T14:21:58.339-0400 [repl index builder 4] build index on: test.foo properties: { v: 1, key: { n: 1.0 }, name: "n_1", ns: "test.foo", background: true }
2014-07-08T14:21:58.339-0400 [repl index builder 4]      building index in background
2014-07-08T14:22:01.008-0400 [repl index builder 4]             Index Build(background): 435100/1000000 43%
2014-07-08T14:22:03.238-0400 [repl index builder 5] build index on: test.foo properties: { v: 1, key: { s: 1.0 }, name: "s_1", ns: "test.foo", background: true }
2014-07-08T14:22:03.239-0400 [repl index builder 5]      building index in background
2014-07-08T14:22:04.000-0400 [repl index builder 4]             Index Build(background): 759600/1000000 75%
2014-07-08T14:22:04.439-0400 [clientcursormon] mem (MB) res:464 virt:4136
2014-07-08T14:22:04.439-0400 [clientcursormon]  mapped (incl journal view):1600
2014-07-08T14:22:04.439-0400 [clientcursormon]  connections:2
2014-07-08T14:22:04.440-0400 [clientcursormon]  replication threads:32
2014-07-08T14:22:04.610-0400 [repl writer worker 1] CMD: drop test.foo
2014-07-08T14:22:04.610-0400 [repl writer worker 1] halting index build: { s: 1.0 }
2014-07-08T14:22:04.610-0400 [repl writer worker 1] halting index build: { n: 1.0 }
2014-07-08T14:22:04.611-0400 [repl writer worker 1] halted 2 index build(s)
2014-07-08T14:22:04.612-0400 [repl index builder 4] index build failed. spec: { v: 1, key: { n: 1.0 }, name: "n_1", ns: "test. foo", background: true } error: 11601 operation was interrupted
2014-07-08T14:22:04.613-0400 [repl index builder 4] ERROR: backgroundjob repl index builder 4 exception: cannot perform operat ion: a background operation is currently running for collectiontest.foo
2014-07-08T14:22:04.613-0400 [repl index builder 5] index build failed. spec: { v: 1, key: { s: 1.0 }, name: "s_1", ns: "test. foo", background: true } error: 11601 operation was interrupted
2014-07-08T14:22:04.614-0400 [repl index builder 5] IndexBuilder could not build index: Interrupted operation was interrupted
2014-07-08T14:22:04.614-0400 [repl writer worker 1] CMD: drop test.foo
2014-07-08T14:22:04.615-0400 [repl writer worker 1] halting index build: { n: 1.0 }
2014-07-08T14:22:04.615-0400 [repl writer worker 1] halted 1 index build(s)
2014-07-08T14:22:04.615-0400 [repl writer worker 1] Assertion: 17348:cannot dropAllIndexes when index builds in progress
2014-07-08T14:22:04.620-0400 [repl writer worker 1] test 0x1006b916b 0x100670c82 0x10066171f 0x1000f28d4 0x1000e89c6 0x1001bd1
65 0x1001b80d5 0x1001b90d3 0x1001b9e7c 0x1004708ca 0x1004a6568 0x1004aaf43 0x100669ba6 0x1006ed5b5 0x7fff88328899 0x7fff883287
2a 0x7fff8832cfc9
 0   mongod                              0x00000001006b916b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x0000000100670c82 _ZN5mongo10logContextEPKc + 114
 2   mongod                              0x000000010066171f _ZN5mongo11msgassertedEiPKc + 255
 3   mongod                              0x00000001000f28d4 _ZNK5mongo12IndexCatalog9_isSpecOkERKNS_7BSONObjE + 0
 4   mongod                              0x00000001000e89c6 _ZN5mongo8Database14dropCollectionERKNS_10StringDataE + 832
 5   mongod                              0x00000001001bd165 _ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 1057
 6   mongod                              0x00000001001b80d5 _ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 37
 7   mongod                              0x00000001001b90d3 _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 3527
 8   mongod                              0x00000001001b9e7c _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 1388
 9   mongod                              0x00000001004708ca _ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb + 6666
 10  mongod                              0x00000001004a6568 _ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb + 712
 11  mongod                              0x00000001004aaf43 _ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE + 83
 12  mongod                              0x0000000100669ba6 _ZN5mongo10threadpool6Worker4loopEv + 152
 13  mongod                              0x00000001006ed5b5 thread_proxy + 229
 14  libsystem_pthread.dylib             0x00007fff88328899 _pthread_body + 138
 15  libsystem_pthread.dylib             0x00007fff8832872a _pthread_struct_init + 0
 16  libsystem_pthread.dylib             0x00007fff8832cfc9 thread_start + 13
2014-07-08T14:22:04.620-0400 [repl writer worker 1] warning: drop: dropIndexes for collection failed. cause: cannot dropAllIndexes when index builds in progress. See http://dochub.mongodb.org/core/data-recovery
2014-07-08T14:22:04.621-0400 [repl writer worker 1] warning: repl Failed command { drop: "foo" } on test with status InternalError drop: dropIndexes for collection failed. cause: cannot dropAllIndexes when index builds in progress. See http://dochub.mongodb.org/core/data-recovery during oplog application



 Comments   
Comment by Githook User [ 14/Jul/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-14494 SERVER-14505 ensure that IndexBuildBlock::fail() can never throw

(backport of 5e06c1745105c8e2b50ba875e4433cd93f471ea4)
Branch: v2.6
https://github.com/mongodb/mongo/commit/8be2bd74167f69fbca94a02c069fde12fb803722

Comment by Githook User [ 14/Jul/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-14494 SERVER-14505 ensure that IndexBuildBlock::fail() can never throw
Branch: master
https://github.com/mongodb/mongo/commit/a1b737a2146ba380ae9a51acfb1138982745d996

Comment by Eric Milkie [ 09/Jul/14 ]

This error is caused by the following:

  • when an index build is interrupted, IndexBuildBlock::fail() is called to clean up the in-progress build
  • depending on how far the index build has progressed, fail() either calls _dropIndex() or _deleteIndexFromDisk().
  • _dropIndex() calls assertNoBgOpInProgress(), perhaps as a sanity check; this assertion fails in the case where a background index fails or is interrupted, which causes the index not to be cleaned up properly
  • later on, the collection drop first tries to drop all indexes, and finds it can't because of the un-cleaned-up in-progress index build

One possible solution would be to move the assertion up into dropIndex(), which would avoid hitting internal uses of _dropIndex().

Generated at Thu Feb 08 03:35:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.