Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-98705

Disable catalog consistency checker for single-shard cluster with config shard and replica set endpoint

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: 8.1.0-rc0
    • Component/s: Catalog, Sharding
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2024-12-23, CAR Team 2025-01-06
    • 0

      SERVER-90768 introduced a test hook which checks the consistency between the output of listCollections, listIndexes and $listCatalog. For non-passthrough test suites, the hook runs whenever a mongod is shut down or restarted, using the following method:

      Even for sharded clusters with separate mongos/shardsvr/configsvr, this doesn't require accessing any remote metadata: Using a direct connection to a shardsvr's mongod allows operating on the node's local data, bypassing sharding (routing, shard version protocol, routing/filtering metadata refreshes, etc.).

      However, for single-shard clusters with replica set endpoint and config shard (e.g. the sharding_auto_boostrap test suite, all feature flags variant), this assumption isn't true: On the one hand, most commands are forced to go through the sharding code paths. Additionally, when the cluster lacks a majority, it becomes unable to obtain routing/filtering metadata, due to waiting for replication on the ShardServerCatalogCacheLoader.

      This wasn't initially found to be problematic because by the time the majority is lost, tests had generally loaded any required routing/filtering metadata in-memory. However, this behavior is flaky and known to fail is various scenarios:

      • With the separate catalog cache, if a request to a sharded collection is routed by a mongos, it will not be routed by the replica set endpoint, so mongods will not learn the routing metadata. When the hook later connects to the replica set endpoint, it will begin by sending the request as UNSHARDED, which will later trigger a refresh and wait for replication when checking the shard version. (This was currently working due to a workaround, which will be removed by SERVER-97511.)

      Disable the catalog consistency checker for single-shard cluster with config shard and replica set endpoint. We can re-enable it when the filtering metadata refresh doesn't need to wait for replication, or by refactoring the hook to only run on replica sets with a majority.

            Assignee:
            joan.bruguera-mico@mongodb.com Joan Bruguera Micó
            Reporter:
            joan.bruguera-mico@mongodb.com Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: