Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54960

ReshardingMetrics time smearing

    • Fully Compatible
    • ALL
    • v5.0
    • 1

      ReshardingMetrics::OperationMetrics::TimeInterval is an "active" object, each with its own clockSource, and this means they are essentially independent unlinked stopwatches.

      This is redundant, but it also presents the architectural problem that ReshardingMetrics now needs a ServiceContext*, which is a much heavier dependency than necessary. All ReshardingMetrics really needs is one ClockSource*. The timeIntervals do not need their own copy of this ClockSource*.

      Since each TimeInterval is calling clockSource->now() internally, the metrics are susceptible to time smearing errors as the intervals are individually started and stopped with incoherent timing. While this doesn't have practical implications, it would perhaps make testing more difficult. We wouldn't for example be able to assert that the time spent in each phase add up to the total operation time. We would need to estimate some error terms or have flaky tests.

      A small change that would solve these problems would be to keep the ClockSource* as a concern only of the ReshardingMetrics object, and when we lock its _mutex, generate a single now() call, so that everything the ReshardingMetrics records will be based on the instant at which its mutex was locked. I think this will simplify things. The TimeInterval methods start, end, and tryEnd, and duration would all accept a Date_t now instead of making their own clockSource->now() calls internally.

            Assignee:
            billy.donahue@mongodb.com Billy Donahue
            Reporter:
            billy.donahue@mongodb.com Billy Donahue
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: