Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68783

Recipient shard may incorrectly return 0 milliseconds remaining in resharding

    • Minor Change
    • ALL
    • v6.1
    • Sharding 2022-08-22, Sharding 2022-09-05
    • 3

      In response to a _shardsvrReshardingOperationTime command (used for querying the estimated remaining time in a resharding operation) from the resharding coordinator, a recipient shard executes this code, which calls ReshardingMetrics::getRecipientHighEstimateRemainingTimeMillis to compute the estimate of the remaining time.  That function may return 0 incorrectly if the shard has just had a failover, and not yet restored all of the metrics.   That can happen because the metrics are only partly restored here and partly restored here.

       

      As a result, if a _shardsvrReshardingOperationTime command enters the system at the wrong time, it may observe only partly restored metrics, and the coordinator would be misled into believing that it can begin the critical section.

       

      This is related to SERVER-67653, but is not the same because in that ticket the coordinator incorrectly treats an omitted remainingMillis field as 0 remainingMillis.  In this ticket, the recipient incorrectly returns 0 remainingMillis.

            Assignee:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Reporter:
            andrew.witten@mongodb.com Andrew Witten (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: