Add metrics for diagnosing resharding validation

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Here are the metrics we might want to expose via currentOp, serverStatus and logs:

      1. The lag of the ReshardingChangeMonitor on each donor and recipient. This can be calculated as the difference between the current clusterTime and latest $changeStream getMore postBatchResumeToken's clusterTime.
      2. The time between when a donor enters "blocking-writes" and its ReshardingChangeStreamsMonitor is completed.
      3. The time between when a recipient enters "strict-consistency" and its ReshardingChangeStreamsMonitor is completed.
      4. The start/end/total time for change streams monitoring on each donor and recipient.
      5. The start/end/total time for the verification before transitioning to "applying" and to "done" itself on the coordinator.

      Please note that we are already exposing the duration of each state in currentOp and start/end time of each state in serverStatus.

            Assignee:
            Unassigned
            Reporter:
            Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: