[SERVER-58389] Capture NetworkInterfaceExceededTimeLimit and MaxTimeMSExpired errors in resharding participants Created: 09/Jul/21 Updated: 29/Oct/23 Resolved: 04/Aug/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.3, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Matthew Walak (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-autocommits | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||
| Sprint: | Sharding 2021-07-26, Sharding 2021-08-09 | ||||||||||||
| Participants: | |||||||||||||
| Story Points: | 1 | ||||||||||||
| Description |
|
In resharding, shards call into the config server in order to update the coordinator document (donor, recipient). NetworkInterfaceExceededTimeLimit and MaxTimeMSExpired errors are not considered retriable, but are definitely reachable – these commands have a timeout of 30 seconds, and one of the listed errors will be thrown if the timeout is reached. These errors will escape any command retrying and resharding-specific transient error retrying, and will ultimately cause an fassert on whatever node is running resharding. The solution here is to figure out the best place to swallow and retry these errors. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 10/Aug/21 ] |
|
Author: {'name': 'Matt Walak', 'email': 'matt.walak@mongodb.com'}Message: |
| Comment by Githook User [ 04/Aug/21 ] |
|
Author: {'name': 'Matt Walak', 'email': 'matt.walak@mongodb.com'}Message: |
| Comment by Max Hirschhorn [ 14/Jul/21 ] |
|
I think we should remove the $maxTimeMS for the updates that shards perform during a resharding operation on the config server. It still doesn't make sense to me why sharding code imposes a $maxTimeMS of anything other than the remaining time of the user-supplied $maxTimeMS (which in resharding's case is infinite time). CC kaloian.manassiev |