Capture Detailed Logical Initial Sync Metrics

XMLWordPrintableJSON

    • Replication
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      There's very little visibility into the aggregate behavior and operation of logical initial syncs today it's very difficult to answer across a large number of clusters:

      • What is the normalized throughput of a logical initial sync
      • What is the success rate of logical initial syncs 
      • What is the typical duration of an initial sync
      • What phase of logical initial syncs do most failures occur
      • How many logical initial syncs are there on a given day

      To assist in answering these questions and others we should capture relevant metrics. We can take inspiration from resharding metrics.

      When complete it should be possible to build a funnel chart/diagram detailing clusters progress through the end-to-end logical initial sync process and charts detailing the performance of logical initial syncs.  

      These metrics should survive the end of initial sync, unlike today.

            Assignee:
            Unassigned
            Reporter:
            Matt Panton
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: