Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Gone away
Priority: Major - P3
Fix Version/s: 4.1 Desired, 5.0.0
Affects Version/s: 3.7.9
Component/s: Sharding
Labels:
- sharding-causes-bfs-hard

Assigned Teams:

Sharding EMEA
Operating System:
ALL
Steps To Reproduce:
Hide

Apply the following patch to remove the setShardVersion from dropCollection (to deterministically reproduce this issue) and run jstests/sharding/upsert_sharded.js.

diff --git a/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp b/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp index 16dd7c4..8c4f39f 100644 --- a/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp +++ b/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp @@ -420,55 +420,6 @@ Status ShardingCatalogManager::dropCollection(OperationContext* opCtx, const Nam LOG(1) << "dropCollection " << nss.ns() << " collection marked as dropped";- for (const auto& shardEntry : allShards) { - auto swShard = shardRegistry->getShard(opCtx, shardEntry.getName()); - if (!swShard.isOK()) { - return swShard.getStatus(); - } - - const auto& shard = swShard.getValue(); - - SetShardVersionRequest ssv = SetShardVersionRequest::makeForVersioningNoPersist( - shardRegistry->getConfigServerConnectionString(), - shardEntry.getName(), - fassertStatusOK(28781, ConnectionString::parse(shardEntry.getHost())), - nss, - ChunkVersion::DROPPED(), - true); - - auto ssvResult = shard->runCommandWithFixedRetryAttempts( - opCtx, - ReadPreferenceSetting{ReadPreference::PrimaryOnly}, - "admin", - ssv.toBSON(), - Shard::RetryPolicy::kIdempotent); - - if (!ssvResult.isOK()) { - return ssvResult.getStatus(); - } - - auto ssvStatus = std::move(ssvResult.getValue().commandStatus); - if (!ssvStatus.isOK()) { - return ssvStatus; - } - - auto unsetShardingStatus = shard->runCommandWithFixedRetryAttempts( - opCtx, - ReadPreferenceSetting{ReadPreference::PrimaryOnly}, - "admin", - BSON("unsetSharding" << 1), - Shard::RetryPolicy::kIdempotent); - - if (!unsetShardingStatus.isOK()) { - return unsetShardingStatus.getStatus(); - } - - auto unsetShardingResult = std::move(unsetShardingStatus.getValue().commandStatus); - if (!unsetShardingResult.isOK()) { - return unsetShardingResult; - } - } - LOG(1) << "dropCollection " << nss.ns() << " completed"; catalogClient
Show
Apply the following patch to remove the setShardVersion from dropCollection (to deterministically reproduce this issue) and run jstests/sharding/upsert_sharded.js. diff --git a/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp b/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp index 16dd7c4..8c4f39f 100644 --- a/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp +++ b/src/mongo/db/s/config/sharding_catalog_manager_collection_operations.cpp @@ -420,55 +420,6 @@ Status ShardingCatalogManager::dropCollection(OperationContext* opCtx, const Nam LOG(1) << "dropCollection " << nss.ns() << " collection marked as dropped" ;- for ( const auto& shardEntry : allShards) { - auto swShard = shardRegistry->getShard(opCtx, shardEntry.getName()); - if (!swShard.isOK()) { - return swShard.getStatus(); - } - - const auto& shard = swShard.getValue(); - - SetShardVersionRequest ssv = SetShardVersionRequest::makeForVersioningNoPersist( - shardRegistry->getConfigServerConnectionString(), - shardEntry.getName(), - fassertStatusOK(28781, ConnectionString::parse(shardEntry.getHost())), - nss, - ChunkVersion::DROPPED(), - true ); - - auto ssvResult = shard->runCommandWithFixedRetryAttempts( - opCtx, - ReadPreferenceSetting{ReadPreference::PrimaryOnly}, - "admin" , - ssv.toBSON(), - Shard::RetryPolicy::kIdempotent); - - if (!ssvResult.isOK()) { - return ssvResult.getStatus(); - } - - auto ssvStatus = std::move(ssvResult.getValue().commandStatus); - if (!ssvStatus.isOK()) { - return ssvStatus; - } - - auto unsetShardingStatus = shard->runCommandWithFixedRetryAttempts( - opCtx, - ReadPreferenceSetting{ReadPreference::PrimaryOnly}, - "admin" , - BSON( "unsetSharding" << 1), - Shard::RetryPolicy::kIdempotent); - - if (!unsetShardingStatus.isOK()) { - return unsetShardingStatus.getStatus(); - } - - auto unsetShardingResult = std::move(unsetShardingStatus.getValue().commandStatus); - if (!unsetShardingResult.isOK()) { - return unsetShardingResult; - } - } - LOG(1) << "dropCollection " << nss.ns() << " completed" ; catalogClient
Linked BF Score:
27
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Let's say the setShardVersion for dropCollection is not sent, and the collection is re-created (and re-sharded). This means the old chunks remain as garbage in the shard's config.cache.chunks.<ns>. (This is because the shard will not refresh after the drop. If it had, it would have seen the collection had been dropped and scheduled a task to drop its local config.cache.chunks.<ns> collection).

The setShardVersion for shardCollection for the new collection will cause the shard to refresh and find the new collection entry. Since the epoch is different from the collection entry in the shard's config.cache.collections, the shard will query for chunks with version $gte 0|0|<new epoch>.

The shard will find the chunk for the new collection and persist it in its config.cache.chunks.<ns>, alongside the garbage chunks. As part of this, the shard will attempt to delete chunks in config.cache.chunks.<ns> with ranges that "overlap" the new chunk.

However, the old chunks may not get deleted if their shard key sorts as "less than" the new shard key (see ~~SERVER-34856~~).

. . .

Now, at the beginning of a migration, the donor shard forces a refresh.

The refresh queries config.chunks for chunks with version $gte the highest version of any chunk persisted on the shard in config.cache.chunks.<ns> (notice that the new collection's epoch is not included in the local query).

If there is a garbage chunk with a major version greater than 1 (that is, at least one migration had occurred), then it will be returned by the local query and its version will be used as the $gte for the remote query. So, the remote query will find no chunks.

The ShardServerCatalogCacheLoader will return ConflictingOperationInProgress, which will cause the CatalogCache to retry the refresh 3 times and get the same result.

Once the retries are exhausted, the donor will fail the moveChunk command with ConflictingOperationInProgress.

related to

SERVER-34904 recipient shard may be unable to accept versioned requests if it doesn't receive setShardVersion for drop and collection is recreated

Closed

Assignee:: [DO NOT USE] Backlog - Sharding EMEA
Reporter:: Esha Maharishi (Inactive)
Participants:: [DO NOT USE] Backlog - Sharding EMEA, Esha Maharishi, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: May 07 2018 10:07:03 PM UTC
Updated:: Mar 12 2024 04:22:25 PM UTC
Resolved:: Jul 26 2021 03:11:56 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates