Investigate mid failure commit on shard catalog with concurrent split/merges

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Works as Designed
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • CAR Team 2026-05-11
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The design of DDLs committing to the shard catalog follows the same pattern:

      • Commit to global catalog
      • Commit to shard catalog
        • Fetch information from CSRS
        • Write collection metadata
        • Write chunks metadata

      Once this project is complete (shards authoritative), all DDLs and chunk operations will be authoritative, and DDLs will stop migrations and split/merges at the start of the DDL.

      However, during upgrade/downgrade and throughout the development of this project, it is unclear whether there are gaps in the implementation.

      What happens if we have a failure while committing to the shard catalog, and a non-authoritative split/merge occurs?

      The commit to the shard catalog is idempotent, but we were not expecting the routing table to change (while not in domain coverage for each shard).

      This ticket is to investigate and unit test this scenario.

            Assignee:
            Pol Pinol
            Reporter:
            Pol Pinol
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: