CheckMetadataConsistency should serialize with addShard

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • CAR Team 2025-09-01, CAR Team 2025-09-15, CAR Team 2025-09-29
    • 2
    • 🟥 DDL, 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      During addShard if we are doing a promotion to sharded cluster and a checkMetadataConsistency issued at the wrong time it could end up in either reporting metadata inconsistency or a crash because of this invariant.

      the addShard's promotion works like
      1) acquire the critical section on the databases on the replicaset
      2) register databases on the config server and register the new shard within a transaction
      3) register the databases on the new shard
      4) release the databases' critical sections

      if a checkMetadataConsistency occurs after 1) but before 4) then we try to access metadata while the addShard is holding the critical section, which is considered as an illegal operation (according to pol.pinol@mongodb.com)

      This is a concern that would cause real problem just in the tests as in user builds the tassert will be just a uassert, and the only phenomenon is that we report an inconsistent metadata on a shard that is undergoing a promotion... which is fair.

      However the window is quite tight, eventually an evergreen test will hit this crash though.

      The task is to figure out how to avoid this situation, most probably by serializing checkMetadataInconsistency with addShard.

            Assignee:
            Tommaso Tocci
            Reporter:
            Wolfee Farkas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: