[SERVER-59694] Resharding Prohibited Commands Incorrectly Assumes Consistency In Config.Cache.Collections Collection Created: 31/Aug/21  Updated: 29/Oct/23  Resolved: 31/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-58917 Wait until donors/recipients are awar... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Steps To Reproduce:
  1. Apply the git diff below
  2. Run the test with the following command

buildscripts/resmoke.py run jstests/sharding/resharding_prohibited_commands.js

diff --git a/src/mongo/db/s/shard_server_catalog_cache_loader.cpp b/src/mongo/db/s/shard_server_catalog_cache_loader.cpp
index 07f713c866..76024fe7bf 100644
--- a/src/mongo/db/s/shard_server_catalog_cache_loader.cpp
+++ b/src/mongo/db/s/shard_server_catalog_cache_loader.cpp
@@ -27,6 +27,7 @@
  *    it in the license file.
  */
 
+#include "mongo/util/time_support.h"
 #define MONGO_LOGV2_DEFAULT_COMPONENT ::mongo::logv2::LogComponent::kSharding
 
 #define LOGV2_FOR_CATALOG_REFRESH(ID, DLEVEL, MESSAGE, ...) \
@@ -118,7 +119,7 @@ Status persistCollectionAndChangedChunks(OperationContext* opCtx,
 
     // Mark the chunk metadata as refreshing, so that secondaries are aware of refresh.
     update.setRefreshing(true);
-
+    sleepsecs(6);
     Status status =
         updateShardCollectionsEntry(opCtx,
                                     BSON(ShardCollectionType::kNssFieldName << nss.ns()),

Sprint: Sharding 2021-09-06
Participants:
Linked BF Score: 151
Story Points: 1

 Description   

Background & Context
The JS Test, resharding_prohibited_commands.js utilizes the config.cache.collections collections in order to verify that the committing decision has been relayed to the recipient.

It does this because it assumes either it will find the cached collection document where the `reshardingFields.state` property will be 'committing' or one of the other state values.

However, unlike other collections, the internal `config.cache.collections` collection has no such consistency guarantees. So it's possible that in between an old document being deleted and the new one being inserted, that it will find nothing.

In the ShardServerCatalogCacheLoader, the function that handles the refreshes to the `config.cache.collections` collection will first delete and then insert a new document in the case of an epoch change.

Since Resharding utilizes epoc changes to invalidate the shard's cache of the collection information, there is a space of time between the delete and insertion of the document in the `config.cache.collections` collection, where the test can read an invalid state (no collection document).

The test therefore is making an invalid assumption about the consistency that the `config.cache.collections` collection actually adheres to.

Proposed Solution
A simple solution would be to update the usages of find queries (3 in total) in the `cache.config.collections` collection in this test to first check if the response is null (because it found no documents matching the query) before using the value.

Such as:

return res && res.reshardingFields.status === "comitting"



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 22/Sep/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59694 added null check for find queries in config.cache.collectionsn collection
Branch: v5.0
https://github.com/mongodb/mongo/commit/b2b3fecba2b6b72dc6e4da06303a4c19f2b8d2a5

Comment by Githook User [ 31/Aug/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-59694 added null check for find queries in config.cache.collectionsn collection
Branch: master
https://github.com/mongodb/mongo/commit/20f10822d217c3e64ddff45012042a30edd3fa93

Generated at Thu Feb 08 05:47:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.