[SERVER-53258] [Resharding] Reject writes in opObserver when disallowWritesForResharding is true Created: 07/Dec/20  Updated: 29/Oct/23  Resolved: 22/Feb/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Alexander Taskov (Inactive)
Resolution: Fixed Votes: 0
Labels: PM-234-M2, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-53631 [Resharding] Verify UUIDs are correct... Closed
is depended on by SERVER-53923 Enforce reshardingCriticalSectionTime... Closed
is depended on by SERVER-53653 [Resharding] Take the critical sectio... Closed
is depended on by SERVER-54672 Enable test of writes succeeding afte... Closed
Problem/Incident
Related
related to SERVER-55852 Shards first acquire LockManager lock... Closed
related to SERVER-52742 Ensure both donorFields and recipient... Closed
related to SERVER-57665 Remove the unused disallowWritesForRe... Closed
is related to SERVER-53911 Increment countWritesDuringCriticalSe... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2021-01-25, Sharding 2021-02-22, Sharding 2021-03-08
Participants:
Linked BF Score: 133
Story Points: 2

 Description   

This is a stop-gap solution and doesn't have to be pretty. Just for now, reject writes when:

  1. The collection has reshardingFields, and
  2. The collection is in the states:
    1. kMirroring, or
    2. kRenaming and the UUID is the original collection.

The above conditions are satisfied with the call to writesShouldRunInDistributedTransaction, which is the "blocking" state that's still named as the "distributed transaction" state.

Suggestion: for now, create a named error code "WriteRejectedForResharding" that's easy to verify in a test. In Milestone 3, a more robust solution will be designed to possibly queue writes.



 Comments   
Comment by Githook User [ 22/Feb/21 ]

Author:

{'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com', 'username': 'alextaskov'}

Message: SERVER-53258 Reject writes during resharding operations using critical section
Branch: master
https://github.com/mongodb/mongo/commit/b5c018e00951264ac6bf0aabdfdfa9274ec186f1

Comment by Haley Connelly [ 08/Feb/21 ]

Note:  SERVER-52742 made it so donors see coordinator changes after the decision is persisted via refreshing on the original collection it no longer owned chunks for. 

It exploited that a shard can see collection updates upon a catalog cache refresh for a collection it does not own - not a good thing.

This catalog cache behavior is subject to change and shouldn't be relied on for communication between the coordinator and donors for this ticket. 

Comment by Max Hirschhorn [ 31/Dec/20 ]

Tangentially related, I feel like there's something odd about how the resharding coordinator isn't notifying donor and recipient shards upon transitioning into CoordinatorStateEnum::kCommitted. Donor shards will drop the existing sharded collection when they hear about CoordinatorStateEnum::kCommitted, which could happen due to a (spurious) refresh triggered by something other than the coordinator. Recipient shards are the same way for renaming the temporary resharding collection when they hear about CoordinatorStateEnum::kCommitted. Do we actually need CoordinatorStateEnum::kCommitted and CoordinatorStateEnum::kRenaming to be two separate state?

Also, I think scenarios (b) and (c) will be problematic where a client could end up seeing a transient NamespaceNotFound error response for writes (maybe even implicitly create a new collection for inserts?) which wouldn't be considered a retryable error. Reads tend to prefer returning an empty result set over a NamespaceNotFound error response which would make this a query correctness issue. Is it possible to prevent the shard version refresh with CoordinatorStateEnum::kCommitted from completing until the RecipientStateMachine has completed the rename? Otherwise, I feel like we must prevent reads and writes on recipient shards until the rename completes.

Comment by Max Hirschhorn [ 31/Dec/20 ]

I'm a little confused about the CollectionMetadata::writesShouldRunInDistributedTransaction(const UUID& originalUUID, const UUID& reshardingUUID) interface for why the caller is responsible for supplying these values. I feel like originalUUID and reshardingUUID should always be coming from the TypeCollectionRecipientFields metadata.

Some facts about CoordinatorStateEnum::kCommitted:

  • CoordinatorStateEnum::kCommitted means the config.collections entry for the temporary resharding collection has been removed. (link)
  • CoordinatorStateEnum::kCommitted means the config.collections entry for the existing sharded collection namespace has its shard key and collection UUID updated. The recipientFields from the temporary resharding collection are copied over too. (link)
  • CoordinatorStateEnum::kCommitted means the config.chunks and config.tags entries for the existing sharded collection namespace have been updated. (link)

This means for a router and shard to have successfully negotiated a shard version with CoordinatorStateEnum::kCommitted that the shard is definitely a recipient and possibly also a donor. I see there being the follow 5 scenarios:

(a) The sharded collection namespace existing with <reshardingField.recipientFields.existingUUID> because neither the drop nor the rename have happened yet.
(b) The sharded collection namespace not existing due to the drop having happened but not the rename yet.
(c) The sharded collection namespace not existing due to the collection never having existed on the non-donor shard and the rename not having happened yet.
(d) The sharded collection namespace existing with <reshardingFields.uuid> because the rename has happened already.
(e) The sharded collection namespace existing with any other UUID because the protocol for resharding a collection was violated.

Scenarios (a) and (b) are only possible when the recipient shard is also a donor.

Of course it is also possible that the router has successfully "negotiated" a shard version where the shard has CoordinatorStateEnum::kCommitted because the router is performing a multi=true update or delete. getDestinedRecipient() deals with this by using CollectionShardingState::getOwnershipFilter() and relying on the chunk ownership being frozen for the sharded collection during a resharding operation. A non-recipient shard would see that it doesn't own the documents the multi=true update would be modifying and could error out. Disallowing multi=true updates on the sharded collection to be dropped may not be worth the effort though.

It is also worth calling out that _cm->getUUID() will always return <reshardingFields.uuid> because that's what CoordinatorStateEnum::kCommitted means. The UUID for the sharded collection namespace as it exists locally on the shard should be retrieved through db/catalog's CollectionCatalog.

I also prepared a diff in case seeing it in code is easier than bulleted lists.

diff --git a/src/mongo/db/s/collection_metadata.cpp b/src/mongo/db/s/collection_metadata.cpp
index 43233ae94c..f30c2b754e 100644
--- a/src/mongo/db/s/collection_metadata.cpp
+++ b/src/mongo/db/s/collection_metadata.cpp
@@ -94,9 +94,12 @@ boost::optional<ShardKeyPattern> CollectionMetadata::getReshardingKeyIfShouldFor
     return ShardKeyPattern(donorFields->getReshardingKey());
 }
 
-bool CollectionMetadata::writesShouldRunInDistributedTransaction(const UUID& originalUUID,
-                                                                 const UUID& reshardingUUID) const {
-    auto reshardingFields = getReshardingFields();
+bool CollectionMetadata::writesShouldRunInDistributedTransaction(
+    const UUID& currentCollectionUUID) const {
+    if (!isSharded())
+        return false;
+
+    const auto& reshardingFields = getReshardingFields();
     if (!reshardingFields)
         return false;
 
@@ -108,8 +111,8 @@ bool CollectionMetadata::writesShouldRunInDistributedTransaction(const UUID& ori
         case CoordinatorStateEnum::kApplying:
             return false;
         case CoordinatorStateEnum::kMirroring:
-        case CoordinatorStateEnum::kCommitted:
             return true;
+        case CoordinatorStateEnum::kCommitted:
         case CoordinatorStateEnum::kRenaming:
             break;
         case CoordinatorStateEnum::kDone:
@@ -117,23 +120,30 @@ bool CollectionMetadata::writesShouldRunInDistributedTransaction(const UUID& ori
             return false;
     }
 
-    // Handle kRenaming:
-    auto currentCollectionUUID = *_cm->getUUID();
+    const auto& recipientFields = reshardingFields->getRecipientFields();
+    uassert(5325800,
+            "Missing 'recipientFields' in collection metadata for resharding operation that has"
+            " committed",
+            recipientFields);
+
+    const auto& originalUUID = recipientFields->getExistingUUID();
+    const auto& reshardingUUID = reshardingFields->getUuid();
 
-    // Renaming has not completed
     if (currentCollectionUUID == originalUUID) {
+        // This shard must be both a donor and recipient. Neither the drop or renameCollection have
+        // happened yet. Writes should continue to be blocked.
         return true;
+    } else if (currentCollectionUUID == reshardingUUID) {
+        // The renameCollection has happened. Writes no longer need be blocked on this shard.
+        return false;
+    } else {
+        uasserted(
+            ErrorCodes::InvalidUUID,
+            "Expected collection to have either the original UUID {} or the resharding UUID {}, but"
+            " the collection instead has UUID {}"_format(originalUUID.toString(),
+                                                         reshardingUUID.toString(),
+                                                         currentCollectionUUID.toString()));
     }
-
-    // Else, renaming must have completed, and the new UUID must be equal to the resharding UUID.
-    uassert(ErrorCodes::InvalidUUID,
-            "Expected collection to have either the original UUID {} or the resharding UUID {}, "
-            "but the collection instead has UUID {}"_format(originalUUID.toString(),
-                                                            reshardingUUID.toString(),
-                                                            currentCollectionUUID.toString()),
-            currentCollectionUUID == reshardingUUID);
-
-    return false;
 }
 
 BSONObj CollectionMetadata::extractDocumentKey(const BSONObj& doc) const {

Generated at Thu Feb 08 05:30:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.