-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
ALL
-
🟦 Shard Catalog
-
None
-
None
-
None
-
None
-
None
-
None
The CollectionCatalog provides two classes of methods:
- Non-snapshot methods (do NOT consider _pendingCommitNamespaces):
- getAllDbNames(), getAllDbNamesForTenant(), getAllTenants()
- getAllCollectionNamesFromDb(), getAllCollectionUUIDsFromDb()
- range() (iterates _orderedCollections only)
- catalog::listDatabases() (a free function wrapper that calls getAllDbNames on latest())
- DatabaseHolder::getNames() (in-memory database list)
- Snapshot-aware methods (check _pendingCommitNamespaces against the storage snapshot):
- getAllConsistentDbNames(), getAllConsistentDbNamesForTenant()
- establishConsistentCollections()
Internal usage of catalog routines that list databases or collections without considering the storage snapshot can miss collections/databases that are pending commit.
During investigation following cases were considered as SAFE usage and not included: Under exclusive/global write lock, startup/recovery, rollback, or test-only.
POTENTIALLY UNSAFE — May need snapshot consistency:
1. validate_db_metadata_cmd.cpp: validateDBMetadata command
getAllDbNames(), range() does not consider pending-commit collections. A newly created database/collection whose DDL transaction has committed in the storage engine but hasn't been reflected in the catalog yet would be invisible.
2. feature_compatibility_version.cpp: hasNoReplicatedCollections
catalog::listDatabases() uses CollectionCatalog::latest() without any snapshot.
Could miss a pending-commit replicated collection and incorrectly return true.
3. replication_coordinator_external_state_impl.cpp: _dropAllTempCollections
MODE_IX does not block concurrent DDL. catalog::listDatabases() could miss a newly created database with temp collections.
4. temp_collections_cleanup_mongod.cpp: onStepUpComplete
same to 3.
5. database_holder_impl.cpp: openDb
The empty check could be wrong - could log "create database" and set justCreated=true even when a collection already exists (pending commit).
6. cluster_server_parameter_initializer.cpp: getAllTenants
That tenant would be invisible. The parameter initialization for that tenant would be skipped.
7. legacy_fcv_step.cpp.cpp: FCV upgrade/downgrade
DatabaseHolder::getNames() (4 occurrences) is an in-memory list that does not consider pending-commit databases. During FCV transitions, the FCV lock does not block DDL globally. A database being created concurrently could be missed, and its deprecated catalog metadata would not be cleaned up or validated.
8. upgrade_downgrade_viewless_timeseries.cpp - Timeseries upgrade/downgrade
DatabaseHolder::getNames() doesn't consider pending-commit databases. A timeseries collection in a pending-commit database could be missed during upgrade/downgrade.
9. collection_catalog_helper.cpp: dropCollectionsWithPrefix
Both getAllDbNames() and getAllCollectionUUIDsFromDb() don't consider pending-commit state. A database that exists only via pending-commit namespaces would not be found.
10. move_primary_coordinator.cpp: getCollectionsToClone
range() does not consider pending-commit collections. During movePrimary, a collection whose create DDL transaction has committed but is still pending in the catalog would be missed from the clone list.
11. shardsvr_check_metadata_consistency_participant_command.cpp: Metadata consistency check
range() doesn't consider pending-commit collections. A pending-commit collection would be invisible to the metadata consistency checker, potentially causing false positives for "collection exists on config server but not on shard" inconsistencies.
12. dbhash.cpp: dbHash command
range() iterates _orderedCollections without consulting _pendingCommitNamespaces. A collection that is pending commit but whose transaction is visible at the snapshot would be missed from the hash computation.
13. database_impl.cpp: DatabaseImpl::init getAllCollectionUUIDsFromDb()
Needs further investigation
- fixes
-
SERVER-91861 Audit internal usage of list databases or collections for needing snapshot consistency
-
- Closed
-