[SERVER-61108] ReshardingCoordinatorService, Config Collection Deletes Can Time Out Waiting for Replication on Coordinator Doc, Leading to Fatal Assertion Created: 29/Oct/21 Updated: 29/Oct/23 Resolved: 29/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 5.0.0, 5.1.0-rc2 |
| Fix Version/s: | 5.2.0, 5.0.4, 5.1.0-rc3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Luis Osta (Inactive) | Assignee: | Luis Osta (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v5.1, v5.0
|
||||
| Sprint: | Sharding 2021-11-01 | ||||
| Participants: | |||||
| Story Points: | 1 | ||||
| Description |
|
Same as |
| Comments |
| Comment by Githook User [ 01/Nov/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit b2531ed72eb81c7a9e4951e4aab93c7d190d3023) |
| Comment by Githook User [ 01/Nov/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: (cherry picked from commit b2531ed72eb81c7a9e4951e4aab93c7d190d3023) |
| Comment by Githook User [ 29/Oct/21 ] |
|
Author: {'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}Message: |
| Comment by Max Hirschhorn [ 29/Oct/21 ] |
|
Also it may be worth noting it isn't required (nor really encouraged) that ReshardingCoordinator uses "majority" write concern for its local writes. PrimaryOnlyService guarantees all of the writes done by a particular Instance occur within a single term. Using {w: 1} and having writes follow writes would be sufficient so long as the ReshardingCoordinator waits for the changes to be majority-committed before attempting to contact the remote shards (e.g. when communicating its decision). We don't allow a cluster to run multiple resharding operations so blocking the thread synchronously isn't a big deal. I'd therefore rather leave the structure of things in the ReshardingCoordinator as-is. We can look to shift the ReshardingCoordinator over to using kNoWaitWriteConcern and WaitForMajorityService::waitUntilMajority() as part of some later refactoring. The risk to value tradeoff isn't good enough to warrant doing it now. |
| Comment by Max Hirschhorn [ 29/Oct/21 ] |
|
We'll also want to change makeFlushRoutingTableCacheUpdatesCmd() because CommandHelpers::appendMajorityWriteConcern() adds a wtimeout of 60 seconds by default. |