Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84769

Resharding remainingOpTime algorithm doesn't work with low elapsedTime

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • 5.0.0, 6.0.0, 7.0.0, 7.2.0
    • Sharding
    • Cluster Scalability
    • ALL

    Description

      In the following algorithm for calculating remainingTime to complete for resharding op, used by resharding commit monitor:

      Milliseconds remainingTime(Milliseconds elapsedTime, double elapsedWork, double totalWork) {
          elapsedWork = std::min(elapsedWork, totalWork);
          double remainingMsec = 1.0 * elapsedTime.count() * (totalWork / elapsedWork - 1);
          return Milliseconds(Milliseconds::rep(remainingMsec));
      }
      

      If the elapsedTime is of the order of few ms, the remainingMsec can be incorrectly reported. For example in the HELP-54235, with ~300k fetched oplog entries (totalWork) and a 1000 applied oplog entries (elapsedWork) and a value of elapsedTime as 6ms will result in engaging the CS as:

      remainingMsec = 1.0 * 6 * (300-1) ≈ 1800 ms = 1.8 seconds < 2 seconds.

      This algorithm needs to change to handle this edge case.

      Attachments

        Activity

          People

            backlog-server-cluster-scalability Backlog - Cluster Scalability
            abdul.qadeer@mongodb.com Abdul Qadeer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: