Fix internal callers of non-snapshot-aware CollectionCatalog listing methods

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • 🟦 Shard Catalog
    • None
    • None
    • None
    • None
    • None
    • None

      The CollectionCatalog provides two classes of methods:

      • Non-snapshot methods (do NOT consider _pendingCommitNamespaces):
        • getAllDbNames()getAllDbNamesForTenant()getAllTenants()
        • getAllCollectionNamesFromDb()getAllCollectionUUIDsFromDb()
        • range() (iterates _orderedCollections only)
        • catalog::listDatabases() (a free function wrapper that calls getAllDbNames on latest())
        • DatabaseHolder::getNames() (in-memory database list)
      • Snapshot-aware methods (check _pendingCommitNamespaces against the storage snapshot):
        • getAllConsistentDbNames()getAllConsistentDbNamesForTenant()
        • establishConsistentCollections()

      Internal usage of catalog routines that list databases or collections without considering the storage snapshot can miss collections/databases that are pending commit.

      During investigation following cases were considered as SAFE usage and not included: Under exclusive/global write lock, startup/recovery, rollback, or test-only.

      POTENTIALLY UNSAFE — May need snapshot consistency:

      1. validate_db_metadata_cmd.cpp: validateDBMetadata command
      getAllDbNames(), range() does not consider pending-commit collections. A newly created database/collection whose DDL transaction has committed in the storage engine but hasn't been reflected in the catalog yet would be invisible.

      2. feature_compatibility_version.cpp: hasNoReplicatedCollections
      catalog::listDatabases() uses CollectionCatalog::latest() without any snapshot.
      Could miss a pending-commit replicated collection and incorrectly return true.

      3. replication_coordinator_external_state_impl.cpp: _dropAllTempCollections
      MODE_IX does not block concurrent DDL. catalog::listDatabases() could miss a newly created database with temp collections.

      4. temp_collections_cleanup_mongod.cpp: onStepUpComplete
      same to 3.

      5. database_holder_impl.cpp: openDb
      The empty check could be wrong - could log "create database" and set justCreated=true even when a collection already exists (pending commit).

      6. cluster_server_parameter_initializer.cpp: getAllTenants
      That tenant would be invisible. The parameter initialization for that tenant would be skipped.

      7. legacy_fcv_step.cpp.cpp: FCV upgrade/downgrade
      DatabaseHolder::getNames() (4 occurrences) is an in-memory list that does not consider pending-commit databases. During FCV transitions, the FCV lock does not block DDL globally. A database being created concurrently could be missed, and its deprecated catalog metadata would not be cleaned up or validated.

      8. upgrade_downgrade_viewless_timeseries.cpp - Timeseries upgrade/downgrade
      DatabaseHolder::getNames() doesn't consider pending-commit databases. A timeseries collection in a pending-commit database could be missed during upgrade/downgrade.

      9. collection_catalog_helper.cpp: dropCollectionsWithPrefix
      Both getAllDbNames() and getAllCollectionUUIDsFromDb() don't consider pending-commit state. A database that exists only via pending-commit namespaces would not be found.

      10. move_primary_coordinator.cpp: getCollectionsToClone
      range() does not consider pending-commit collections. During movePrimary, a collection whose create DDL transaction has committed but is still pending in the catalog would be missed from the clone list.

      11. shardsvr_check_metadata_consistency_participant_command.cpp: Metadata consistency check
      range() doesn't consider pending-commit collections. A pending-commit collection would be invisible to the metadata consistency checker, potentially causing false positives for "collection exists on config server but not on shard" inconsistencies.

      12. dbhash.cpp: dbHash command
      range() iterates _orderedCollections without consulting _pendingCommitNamespaces. A collection that is pending commit but whose transaction is visible at the snapshot would be missed from the hash computation.

      13. database_impl.cpp: DatabaseImpl::init getAllCollectionUUIDsFromDb()
      Needs further investigation

            Assignee:
            Unassigned
            Reporter:
            Igor Praznik
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: