[SERVER-32989] `repairDatabase` can race with `dropDatabase`. Created: 30/Jan/18 Updated: 29/Oct/23 Resolved: 13/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | 3.6.0, 3.7.1 |
| Fix Version/s: | 3.6.5, 3.7.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | rollback-non-functional | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||
| Sprint: | Repl 2018-02-12, Repl 2018-02-26 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
3.6 introduced two phase drops where, with respect to dropDatabase, all collection drops must be majority confirmed before the database drop is processed. When starting to process dropDatabase, the in-memory Database catalog object is marked as "drop pending". This is used to protect against things like collections being created, which is normally protected by the database lock preventing concurrent access. The problem is that when waiting for the drops to become majority confirmed, dropDatabase releases all locks. That is done because locks should not be held by a block that is running a blocking operation. The interleaving of `repairDatabase` and `dropDatabase`, specifically, is amusing. `repairDatabase` I believe deletes and recreates the in-memory `Database` object, resetting the `_dropPending` member field to false. This allows a `createCollection` to now occur before the `dropDatabase` finishes being processed. Some considerations for addressing this bug: I believe this means that anything acquiring a Database object should check Database::isDropPending. Additionally, it may make sense for AutoGetDb to check the drop pending flag as a way to centralize that logic. Note that repairDatabase grabs a global lock, circumventing any protection the AutoGetDb would offer. |
| Comments |
| Comment by Githook User [ 19/Apr/18 ] |
|
Author: {'email': 'judah@mongodb.com', 'username': 'judahschvimer', 'name': 'Judah Schvimer'}Message: (cherry picked from commit 1e2160c8f0480a1d8be1d671b5b7e22e52986a2c) |
| Comment by Githook User [ 13/Feb/18 ] |
|
Author: {'email': 'judah@mongodb.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'}Message: |
| Comment by Judah Schvimer [ 01/Feb/18 ] |
|
As discussed with daniel.gottlieb, while other operations probably should not succeed on a database that is drop pending, a more general solution for preventing operations on databases that are drop pending is out of scope of this ticket. |
| Comment by Judah Schvimer [ 01/Feb/18 ] |
|
To start, I will fix this race by checking if the database is drop-pending after repairDatabase takes the GlobalWrite lock, which it holds for the entire repair (I will enforce that we at least take a database X lock for repair). We only ever set the database to be drop-pending while holding an X lock on the database, so once the GlobalWrite lock is taken by repairDatabase, the database cannot be set to drop-pending until the repair completes. We call repair at the end of initial sync as well, so we must make sure this is done in both places or in the common code. |