[SERVER-53258] [Resharding] Reject writes in opObserver when disallowWritesForResharding is true Created: 07/Dec/20 Updated: 29/Oct/23 Resolved: 22/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Alexander Taskov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M2, PM-234-T-lifecycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2021-01-25, Sharding 2021-02-22, Sharding 2021-03-08 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 133 | ||||||||||||||||||||||||||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This is a stop-gap solution and doesn't have to be pretty. Just for now, reject writes when:
The above conditions are satisfied with the call to writesShouldRunInDistributedTransaction, which is the "blocking" state that's still named as the "distributed transaction" state. Suggestion: for now, create a named error code "WriteRejectedForResharding" that's easy to verify in a test. In Milestone 3, a more robust solution will be designed to possibly queue writes. |
| Comments |
| Comment by Githook User [ 22/Feb/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com', 'username': 'alextaskov'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Haley Connelly [ 08/Feb/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note: It exploited that a shard can see collection updates upon a catalog cache refresh for a collection it does not own - not a good thing. This catalog cache behavior is subject to change and shouldn't be relied on for communication between the coordinator and donors for this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 31/Dec/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tangentially related, I feel like there's something odd about how the resharding coordinator isn't notifying donor and recipient shards upon transitioning into CoordinatorStateEnum::kCommitted. Donor shards will drop the existing sharded collection when they hear about CoordinatorStateEnum::kCommitted, which could happen due to a (spurious) refresh triggered by something other than the coordinator. Recipient shards are the same way for renaming the temporary resharding collection when they hear about CoordinatorStateEnum::kCommitted. Do we actually need CoordinatorStateEnum::kCommitted and CoordinatorStateEnum::kRenaming to be two separate state? Also, I think scenarios (b) and (c) will be problematic where a client could end up seeing a transient NamespaceNotFound error response for writes (maybe even implicitly create a new collection for inserts?) which wouldn't be considered a retryable error. Reads tend to prefer returning an empty result set over a NamespaceNotFound error response which would make this a query correctness issue. Is it possible to prevent the shard version refresh with CoordinatorStateEnum::kCommitted from completing until the RecipientStateMachine has completed the rename? Otherwise, I feel like we must prevent reads and writes on recipient shards until the rename completes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 31/Dec/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm a little confused about the CollectionMetadata::writesShouldRunInDistributedTransaction(const UUID& originalUUID, const UUID& reshardingUUID) interface for why the caller is responsible for supplying these values. I feel like originalUUID and reshardingUUID should always be coming from the TypeCollectionRecipientFields metadata. Some facts about CoordinatorStateEnum::kCommitted:
This means for a router and shard to have successfully negotiated a shard version with CoordinatorStateEnum::kCommitted that the shard is definitely a recipient and possibly also a donor. I see there being the follow 5 scenarios: (a) The sharded collection namespace existing with <reshardingField.recipientFields.existingUUID> because neither the drop nor the rename have happened yet. Scenarios (a) and (b) are only possible when the recipient shard is also a donor. Of course it is also possible that the router has successfully "negotiated" a shard version where the shard has CoordinatorStateEnum::kCommitted because the router is performing a multi=true update or delete. getDestinedRecipient() deals with this by using CollectionShardingState::getOwnershipFilter() and relying on the chunk ownership being frozen for the sharded collection during a resharding operation. A non-recipient shard would see that it doesn't own the documents the multi=true update would be modifying and could error out. Disallowing multi=true updates on the sharded collection to be dropped may not be worth the effort though. It is also worth calling out that _cm->getUUID() will always return <reshardingFields.uuid> because that's what CoordinatorStateEnum::kCommitted means. The UUID for the sharded collection namespace as it exists locally on the shard should be retrieved through db/catalog's CollectionCatalog. I also prepared a diff in case seeing it in code is easier than bulleted lists.
|