[SERVER-43206] All callers of DatabaseImpl::dropCollectionEvenIfSystem must first assert under a lock that no index builds are in progress Created: 06/Sep/19  Updated: 27/Oct/23  Resolved: 20/Mar/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Gregory Wlodarek
Resolution: Gone away Votes: 0
Labels: groomed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-46122 Make the drop command abort in-progre... Closed
related to SERVER-46123 Make the dropDatabase command abort i... Closed
related to SERVER-42487 Acquire locks by UUID in IndexBuildsC... Closed
Operating System: ALL
Sprint: Execution Team 2020-04-06
Participants:
Linked BF Score: 9

 Description   

TLDR: DatabaseImpl::dropCollectionEvenIfSystem invariants against index builds in progress by looking at the IndexCatalog state. However, the IndexBuildsCoordinator needs to clean up state after an index build, expecting the collection to continue to exist, without a lock.

----------------------------------
As described in the linked test failure, DatabaseImpl::dropCollectionEvenIfSystem, which is always called under a collection X lock (an invariant), will invariant against any index builds being in progress according to the IndexCatalog. However, the IndexCatalog will be told an index build is complete before the IndexBuildsCoordinator (MultiIndexBlock::cleanUpAfterBuild, IndexBuildInterceptor, etc.) state has been cleaned up. We no longer hold a lock across index build tear down, so DatabaseImpl::dropCollectionEvenIfSystem is able to go ahead and drop the collection, when the index build still needs it to exist. All callers of DatabaseImpl::dropCollectionEvenIfSystem, therefore, must check with BackgroundOperation and IndexBuildsCoordinator that no index builds are in progress – clearly one of the callers is not doing this check, because of the test failure (I didn't go ahead and look for which caller, because there are a lot of them to look at).

That's ^ one proposal for how to fix the problem. I haven't thought about other options. It isn't great because there's no way to enforce new callers of DatabaseImpl::dropCollectionEvenIfSystem don't mess up.

-------------------------------
Prior to SERVER-42487, this bug was feasible, but much less likely to occur. SERVER-42487 Moved index build database and collection locks into a smaller scope because it wasn't guaranteed that the locks would still be locked after that smaller scope ran, in the larger scope. So previously, index builds would have been unprotected only when the index build threw an error; now, it's always unprotected, surfacing this bug – at least I think it's a bug, and if it isn't, then we need better documentation that successful index builds need a lock and unsuccessful index builds don't need a lock for tear down.



 Comments   
Comment by Gregory Wlodarek [ 20/Mar/20 ]

I took a look at this and the associated BF. With the abort changes made to drop (SERVER-46122) and dropDatabase (SERVER-46123) a while back, I'm confident that those changes will prevent this event from reoccurring.

  1. When you drop a collection or database today, we first abort all in-progress index builds on the respective collections.
  2. We replicate the abortIndexBuild oplog entry before replication the drops to the secondaries.

Today, in no way should we be dropping collections with index builds still registered to the IndexBuildsCoordinator.

Comment by Dianna Hohensee (Inactive) [ 27/Feb/20 ]

We would have to look at how index build tear down works now (if it has a final form).

The BF was closed because it didn't reoccur for a long time, not because there was a known fix that went into master.

Comment by Connie Chen [ 20/Feb/20 ]

Review whether this is necessary as the linked BF seems to have gone away

Generated at Thu Feb 08 05:02:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.