[SERVER-54227] resharding metrics estimate algorithm Created: 02/Feb/21  Updated: 29/Oct/23  Resolved: 09/Feb/21

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2021-02-22, Service Arch 2021-02-08
Participants:

 Description   

 

https://github.com/mongodb/mongo/blob/4941c446e9091caa1f7151bd874c8d5b72c39bb8/src/mongo/db/s/resharding/resharding_metrics.cpp#L226

This algorithm estimates remaining operation time.

I believe it's incorrect in the case of (bytesCopied>0 && oplogEntriesApplied==0), which would be the forward-looking estimate made from within the copy phase.

It also has significant numerical issues in the granularity of the integer math. Integer division takes place too eagerly via operator/ and via duration casting.

Let's review, improve, and test it.

 



 Comments   
Comment by Githook User [ 09/Feb/21 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-54227 ReshardingMetrics fix "remaining time" estimate
Branch: master
https://github.com/mongodb/mongo/commit/c6afa9e7eb6c5f9ea7ce19c86b1658d055ec3872

Comment by Billy Donahue [ 03/Feb/21 ]

Existing tests for this only passed because they chose numbers that used exact multiples of elapsed and expected work, and so didn't observe rounding effects. And also only tested when the elapsed was at exactly the halfway point of the task, so the bug was hidden because it was only tested when progress == 1-progress.

 

Comment by Billy Donahue [ 03/Feb/21 ]

Code Review https://mongodbcr.appspot.com/764110019/

Generated at Thu Feb 08 05:32:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.