Review locking liveliness for shard catalog timeseries upgrade/downgrade

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0
    • Component/s: None
    • None
    • Catalog and Routing
    • 🟦 Shard Catalog
    • None
    • None
    • None
    • None
    • None
    • None

      The viewless timeseries upgrade/downgrade, as implemented by SERVER-114830 / SERVER-114505, must lock both the main(view) and buckets namespaces.

       

      Viewful timeseries collections don't generally follow the canonical locking order (increasing ResourceId), so to preclude deadlocks, we set a lock deadline (30s) and an indefinite retry loop when acquiring the locks.

       

      Usually the loop will quickly succeed, but in general the loop may end up taking a long time to succeed (e.g. in a cluster busy with long running reads holding locks).

       

      This ticket is to review if this could realistically happen, and if so implement a solution (e.g. make viewful timeseries collections follow the canonical locking order, bubble up the error, expand the deadline, kill conflicting operations: SERVER-106990, unify buckets-view lock: SERVER-99646, etc.).

            Assignee:
            Unassigned
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: