[SERVER-67653] Resharding coordinator can incorrectly conclude that it can start the critical section although on one recipient the oplog applier hasn't caught up with the oplog fetcher Created: 29/Jun/22 Updated: 29/Oct/23 Resolved: 24/Aug/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.13, 6.0.2, 6.1.0-rc3, 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Cheahuychou Mao | Assignee: | Andrew Witten (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v6.1, v6.0, v5.0
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Story Points: | 3 | ||||||||||||||||
| Description |
|
The resharding coordinator queries all recipients for an estimation of the remaining time for the active resharding operation of participant shards in CoordinatorCommitMonitor::queryRemainingOperationTimeForRecipients using command _shardsvrReshardingOperationTime.
The recipient shards handle that command here. The function ReshardingMetrics::getOperationRemainingTime could possibly return boost::none. In that case, the "recipientMillis" field of the recipient's response to the coordinator will be omitted.
If all participants were to omit this field, then these if statements won't be entered and when the participant reads the max remaining time (here) remainingTimes.max would still be 0 and remainingTimes.min would still be Milliseconds::max (as they were initialized). In particular, the invariant mentioned here would fail.
The effect of this, is that a recipient returning an empty "remainingMillis" field is equivalent to it returning "remainingMillis: 0." This is a bug in at least one case: where _shardsvrReshardingOperationTime is run against a recipient shard before the recipient shard has restored its metrics (during a step up).
As a result, the coordinator would, believing that recipientMillis was under the threshold for all recipients, prematurely begin the critical section, and the resharding operation would fail with ReshardingCriticalSectionTimeout if the recipient above doesn't manage to enter the "strict-consistency" state within the timeout. |
| Comments |
| Comment by Githook User [ 20/Sep/22 ] | |||||||||||||
|
Author: {'name': 'Andrew Witten', 'email': 'andrew.witten@mongodb.com', 'username': 'awitten1'}Message: (cherry picked from commit abd6330d793235c8fbc5de7bf3ec53855ebea9d3)
(cherry picked from commit dd9d37fa07a7af46ac4ee8dabeb235dc9571afb0) | |||||||||||||
| Comment by Githook User [ 14/Sep/22 ] | |||||||||||||
|
Author: {'name': 'Andrew Witten', 'email': 'andrew.witten@mongodb.com', 'username': 'awitten1'}Message: (cherry picked from commit abd6330d793235c8fbc5de7bf3ec53855ebea9d3)
| |||||||||||||
| Comment by Githook User [ 14/Sep/22 ] | |||||||||||||
|
Author: {'name': 'Andrew Witten', 'email': 'andrew.witten@mongodb.com', 'username': 'awitten1'}Message: (cherry picked from commit abd6330d793235c8fbc5de7bf3ec53855ebea9d3)
| |||||||||||||
| Comment by Max Hirschhorn [ 01/Sep/22 ] | |||||||||||||
|
Author: {'name': 'Andrew Witten', 'email': 'andrew.witten@mongodb.com', 'username': 'awitten1'}Message: | |||||||||||||
| Comment by Andrew Witten (Inactive) [ 10/Aug/22 ] | |||||||||||||
|
I am noticing this however, when I add the following invariant:
We do hit this invariant, meaning there are cases where all recipients are hitting this return when handling command "_shardsvrReshardingOperationTime." In the case where we would hit this invariant the coordinator would be misled into thinking we can begin the critical section (that's because remainingTime.max is initialized here to 0, and if remainingTime.max < remainingTime.min is true, then remainingTime.max hasn't been assigned to since it was initialized). I'm wondering if
| |||||||||||||
| Comment by Andrew Witten (Inactive) [ 09/Aug/22 ] | |||||||||||||
|
One thing about this confuses me: when I look at the code for handling the command "_shardsvrReshardingOperationTime" I see this which then calls this and computes "remainingMillis" with a direct call remainingOperationTime directly, which never returns -1. |