[SERVER-58407] Resharding components do not retry on FailedToSatisfyReadPreference when targeting remote shard, leading to server crash Created: 09/Jul/21 Updated: 29/Oct/23 Resolved: 22/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0, 5.0.4, 5.1.0-rc2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Max Hirschhorn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-autocommits | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v5.1, v5.0
|
||||||||||||||||||||
| Sprint: | Sharding 2021-11-01 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Story Points: | 1 | ||||||||||||||||||||
| Description |
|
There are multiple places where the ReshardingCoordinatorService, ReshardingRecipientService, and ReshardingDonorService attempt to target the primary of a replica set shard:
Internally, these function calls go through RemoteCommandTargeterRS::findHost() and will throw a FailedToSatisfyReadPreference after kDefaultFindHostTimeout 15 seconds if a primary is unavailable on the remote shard. This exception is caught and leads to an fassert() because, for example, it would be invalid for the participant shards to complete the resharding operation without performing a w:majority on the config server primary. The resharding components should instead wait until a primary becomes available on the remote shard to avoid triggering this fassert().
|
| Comments |
| Comment by Githook User [ 22/Oct/21 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: (cherry picked from commit 03bec439f7c1ce1d8242de40eea130d9a3518a28) |
| Comment by Githook User [ 22/Oct/21 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: (cherry picked from commit 03bec439f7c1ce1d8242de40eea130d9a3518a28) |
| Comment by Githook User [ 22/Oct/21 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: |
| Comment by Max Hirschhorn [ 21/Oct/21 ] |
|
I've gone ahead and updated the ticket description with an example of the recipient shard not retrying on FailedToSatisfyReadPreference while the config server primary is unavailable leads to the recipient shard crashing. |
| Comment by Blake Oler [ 26/Jul/21 ] |
|
I don't remember at the moment, but it's most likely stemming from trying to send commands to remote shards. I'll be sure to update once I see it again. |
| Comment by Max Hirschhorn [ 23/Jul/21 ] |
|
blake.oler, could you clarify from which component you had observed a FailedToSatisfyReadPreference exception? Similar to |