Investigate removing/reducing the CRUD block during granularity changes on timeseries

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 6.0.0, 7.0.0, 8.0.0, 8.1.0, 8.3.0-rc0, 8.2.0
    • Component/s: None
    • Catalog and Routing
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      When running a collMod with a timeseries granularity change over a tracked collection, the collMod DDL coordinator blocks CRUD operations for all shards that have data for that collection. It appears this is done only because if a timeseries collection is sharded on time, the routing of queries also depends on the granularity value in addition to the chunk bounds, so that the Shard Version Protocol is not sufficient.

       

      In addition, the method for unblocking CRUD operations is prone to causing unavailability in sharded collections. CRUD gets unblocked at the beginning of the ShardSvrCollModParticipant command sent to each shard, which are serialized by the collMod coordinator. So, if the DB primary shard fails or can not immediately execute the collMod participant, the collection remains unavailable in the other shards.

      This has made issues like SERVER-108801 and SERVER-107819 more impactful.

       

      We should investigate:

      • Can we entirely avoid blocking CRUD for granularity changes on timeseries collection not sharded on time?
      • Can we unblock CRUD as a separate action done before sending the ShardSvrCollModParticipant commands?
        • This may be the case since the CRUD block appears to be designed to synchronize the granularity update on the global catalog (not the granularity update on the local catalog).

            Assignee:
            Unassigned
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: