[SERVER-29768] Primary does not need to do a complete chunk metadata read in ShardServerCatalogCacheLoader Created: 21/Jun/17  Updated: 30/Oct/23  Resolved: 03/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.5.12

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done after SERVER-27714 Create indexes on shards for config.c... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-08-21
Participants:

 Description   

Once indexes are in, the primary can safely do incomplete metadata reads. Put this helper function in

/**
 * Attempt to read the collection and chunk metadata. May not read all the updates if the metadata
 * for the collection is being updated concurrently.
 *
 * TODO: this is unsafe without an index on the relevant chunks collection. Will be used when
 * SERVER-27714 is done.
 *
 * If the epoch changes while reading the chunks, returns an empty object.
 */
StatusWith<CollectionAndChangedChunks> getImcompletePersistediMetadataSinceVersion(
    OperationContext* opCtx,
    const NamespaceString& nss,
    ChunkVersion version) {
 
    try {
        CollectionAndChangedChunks collAndChunks = getPersistedMetadataSinceVersion(
            opCtx,
            nss,
            version,
            false);
        if (collAndChunks.changedChunks.empty()) {
            // Found a collections entry, but the chunks are being updated.
            return CollectionAndChangedChunks();
        }
 
        // Make sure the collections entry epoch has not changed since we began reading chunks --
        // an epoch change between reading the collections entry and reading the chunk metadata
        // would invalidate the chunks.
 
        auto afterShardCollectionsEntry = uassertStatusOK(readShardCollectionsEntry(opCtx, nss));
        if (shardCollectionEntry.getEpoch() != afterShardCollectionsEntry.getEpoch()) {
            // The collection was dropped and recreated since we began. Return empty results.
            return CollectionAndChangedChunks();
        }
 
        return collAndChunks;
    } catch (const DBException& ex) {
        Status status = ex.toStatus();
        if (status == ErrorCodes::NamespaceNotFound) {
            return CollectionAndChangedChunks();
        }
        return Status(
                ErrorCodes::OperationFailed,
                str::stream() << "Failed to reload local metadata due to '"
                              << status.toString()
                              << "'.");
    }
}



 Comments   
Comment by Githook User [ 10/Aug/17 ]

Author:

{'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna Hohensee'}

Message: SERVER-29768 Primary does not need to do a complete chunk metadata read in ShardServerCatalogCacheLoader
Branch: master
https://github.com/mongodb/mongo/commit/04ad5cd366f476e388a0c11f4748b9a4f6306d79

Comment by Dianna Hohensee (Inactive) [ 26/Jul/17 ]

I'm upgrading the importance of this task to a bug fix. If the persistence task on the primary fails, it will leave 'refreshing' == true, so no metadata can be read. I think if there's a race between a getChunksSince trying to do _getLoaderMetadata and the persistence task that was scheduled by the same getChunksSince, and the persistence attempt wins, then _getLoaderMetadata gets stuck, and the CatalogCache won't call again until the shard loader returns a response via the callbackFn.

This ticket fixes that issue: the primary won't wait for a complete routing table view.

Generated at Thu Feb 08 04:21:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.