[SERVER-34531] listDatabases command can miss a database if a collection in that database is renamed concurrently Created: 17/Apr/18  Updated: 29/Oct/23  Resolved: 15/May/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Maria van Keulen
Resolution: Fixed Votes: 0
Labels: nyc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File initial_sync_concurrent_renames.js     File list_databases_and_rename_collection.js    
Issue Links:
Depends
depends on SERVER-34968 Running listDatabases command and ren... Closed
Related
related to SERVER-34615 find by UUID can return NamespaceNotF... Closed
related to SERVER-37552 Illegal concurrent access to KVDataba... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2018-05-07, Storage NYC 2018-05-21
Participants:
Linked BF Score: 63

 Description   

It is possible for the listDatabases command to erroneously omit a database if the database contains a single collection and that collection is concurrently renamed. The problem stems from the fact that the listDatabases command takes a GlobalLock in MODE_IS. The renameCollection command acquires a GlobalLock in MODE_IX and a MODE_X database lock on the database on which it is performing the rename. Global locks of type IX and IS do not conflict, so the listDatabases command and renameCollection command are allowed to run concurrently.

When the renameCollection command executes a rename within the same database, it will call DatabaseImpl::renameCollection, which as part of the rename operation, will call KVDatabaseCatalogEntryBase::renameCollection and remove the entry for the source collection from KVDatabaseCatalogEntryBase::_collections. It will then insert the entry for the destination collection to the structure here, before it finishes. If there was only one collection in the database, the KVDatabaseCatalogEntryBase::_collections structure will be empty until the entry for the destination collection is added. If, during this period, a listDatabases command is running, it is possible that it will view the database object in this state and consider it to be empty. It checks for the "emptiness" of KVDatabaseCatalogEntryBase::_collections here, in KVStorageEngine::listDatabases. This can cause this database to be missed, even though it should exist.

This can be a problem internally, for example, for initial sync, which relies on the correctness of the results returned by the listDatabases command for its collection cloning process.

There is a repro attached demonstrating how the listDatabases command can produce incorrect results. There is also a repro attached demonstrating how this issue could lead to a collection missing on a node following initial sync. Running these tests on repeat for a few runs should produce the respective error cases.



 Comments   
Comment by Githook User [ 15/May/18 ]

Author:

{'email': 'maria@mongodb.com', 'username': 'mvankeulen94', 'name': 'Maria van Keulen'}

Message: SERVER-34531 Ensure KVCatalog _collections is not empty during rename
Branch: master
https://github.com/mongodb/mongo/commit/0192520fa62db28787a5fb6ad828c1723d7d992c

Comment by Maria van Keulen [ 08/May/18 ]

I believe the bug occurs due to the KVDatabaseCatalogEntryBase::_collections object rather than the DatabaseImpl::_collections object. Hence, I believe this can be fixed at the KVDatabaseCatalogEntryBase level. I have updated the ticket description.

Comment by William Schultz (Inactive) [ 18/Apr/18 ]

I was able to reproduce this on 3.6 and 3.4.

Comment by Spencer Brody (Inactive) [ 17/Apr/18 ]

Ah I see, I was thinking about the collection getting renamed out of the database, not about it being renamed within the same db. Agreed that is a bug.

Comment by William Schultz (Inactive) [ 17/Apr/18 ]

spencer You are correct that if a database has no collections, it effectively does not exist, and so we should not need to copy anything for initial sync. But if a database contains a single collection, that is in the middle of being renamed within that database, it is incorrect for a listDatabases command to not include that database in its result set, since the database most certainly exists. The problem is that the listDatabases command is not viewing the renameCollection as a single atomic operation. If a collection exists in database "test", and that collection gets renamed to a different name in the same database, I would consider it an invariant that listDatabases always includes "test" in its list of results; before, during, and after the renameCollection operation.

Comment by Spencer Brody (Inactive) [ 17/Apr/18 ]

I'm confused how this causes an issue in initial sync. If the database has no collections in it, then effectively the database doesn't exist. Why would it be a problem for it to not show up in listDatabases?

Generated at Thu Feb 08 04:36:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.