[SERVER-34531] listDatabases command can miss a database if a collection in that database is renamed concurrently Created: 17/Apr/18 Updated: 29/Oct/23 Resolved: 15/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Maria van Keulen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | nyc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Storage NYC 2018-05-07, Storage NYC 2018-05-21 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 63 | ||||||||||||||||||||
| Description |
|
It is possible for the listDatabases command to erroneously omit a database if the database contains a single collection and that collection is concurrently renamed. The problem stems from the fact that the listDatabases command takes a GlobalLock in MODE_IS. The renameCollection command acquires a GlobalLock in MODE_IX and a MODE_X database lock on the database on which it is performing the rename. Global locks of type IX and IS do not conflict, so the listDatabases command and renameCollection command are allowed to run concurrently. When the renameCollection command executes a rename within the same database, it will call DatabaseImpl::renameCollection, which as part of the rename operation, will call KVDatabaseCatalogEntryBase::renameCollection and remove the entry for the source collection from KVDatabaseCatalogEntryBase::_collections. It will then insert the entry for the destination collection to the structure here, before it finishes. If there was only one collection in the database, the KVDatabaseCatalogEntryBase::_collections structure will be empty until the entry for the destination collection is added. If, during this period, a listDatabases command is running, it is possible that it will view the database object in this state and consider it to be empty. It checks for the "emptiness" of KVDatabaseCatalogEntryBase::_collections here, in KVStorageEngine::listDatabases. This can cause this database to be missed, even though it should exist. This can be a problem internally, for example, for initial sync, which relies on the correctness of the results returned by the listDatabases command for its collection cloning process. There is a repro attached demonstrating how the listDatabases command can produce incorrect results. There is also a repro attached demonstrating how this issue could lead to a collection missing on a node following initial sync. Running these tests on repeat for a few runs should produce the respective error cases. |
| Comments |
| Comment by Githook User [ 15/May/18 ] |
|
Author: {'email': 'maria@mongodb.com', 'username': 'mvankeulen94', 'name': 'Maria van Keulen'}Message: |
| Comment by Maria van Keulen [ 08/May/18 ] |
|
I believe the bug occurs due to the KVDatabaseCatalogEntryBase::_collections object rather than the DatabaseImpl::_collections object. Hence, I believe this can be fixed at the KVDatabaseCatalogEntryBase level. I have updated the ticket description. |
| Comment by William Schultz (Inactive) [ 18/Apr/18 ] |
|
I was able to reproduce this on 3.6 and 3.4. |
| Comment by Spencer Brody (Inactive) [ 17/Apr/18 ] |
|
Ah I see, I was thinking about the collection getting renamed out of the database, not about it being renamed within the same db. Agreed that is a bug. |
| Comment by William Schultz (Inactive) [ 17/Apr/18 ] |
|
spencer You are correct that if a database has no collections, it effectively does not exist, and so we should not need to copy anything for initial sync. But if a database contains a single collection, that is in the middle of being renamed within that database, it is incorrect for a listDatabases command to not include that database in its result set, since the database most certainly exists. The problem is that the listDatabases command is not viewing the renameCollection as a single atomic operation. If a collection exists in database "test", and that collection gets renamed to a different name in the same database, I would consider it an invariant that listDatabases always includes "test" in its list of results; before, during, and after the renameCollection operation. |
| Comment by Spencer Brody (Inactive) [ 17/Apr/18 ] |
|
I'm confused how this causes an issue in initial sync. If the database has no collections in it, then effectively the database doesn't exist. Why would it be a problem for it to not show up in listDatabases? |