Details
-
Bug
-
Resolution: Unresolved
-
Major - P3
-
None
-
5.0.0, 6.0.0, 7.0.0, 7.2.0
-
Cluster Scalability
-
ALL
Description
In the following algorithm for calculating remainingTime to complete for resharding op, used by resharding commit monitor:
Milliseconds remainingTime(Milliseconds elapsedTime, double elapsedWork, double totalWork) { |
elapsedWork = std::min(elapsedWork, totalWork);
|
double remainingMsec = 1.0 * elapsedTime.count() * (totalWork / elapsedWork - 1); |
return Milliseconds(Milliseconds::rep(remainingMsec)); |
}
|
If the elapsedTime is of the order of few ms, the remainingMsec can be incorrectly reported. For example in the HELP-54235, with ~300k fetched oplog entries (totalWork) and a 1000 applied oplog entries (elapsedWork) and a value of elapsedTime as 6ms will result in engaging the CS as:
remainingMsec = 1.0 * 6 * (300-1) ≈ 1800 ms = 1.8 seconds < 2 seconds.
This algorithm needs to change to handle this edge case.