[SERVER-67128] Pre-populate ShardRegistry during startup at Config shard Created: 08/Jun/22  Updated: 29/Oct/23  Resolved: 27/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam2, sharding-nyc-subteam2-catalog-poc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-67258 ShardNotFound PM-2290 poc Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2022-06-27, Sharding 2022-07-11
Participants:

 Comments   
Comment by Andrew Shuvalov (Inactive) [ 22/Jun/22 ]

Here is why I think it is all failing (race, not deterministically):
The key is that the ShardingCatalogClientImpl is setting the default read selector for all config reads to nearest here.
In the initial version of catalog shard code this doesn’t matter because we did it though ShardConfig wrapper that always reads locally. After we started this large refactoring I changed the catalog client to go always remote, so now it is actually picking the nearest shard. So why it can break the test (1 out of 5 runs)?

The ShardRegistry::_lookup is calling the ShardRegistryData::createFromCatalogClient to refresh the empty cache, the client picks the nearest replica. If a secondary picks another secondary as nearest, it cannot run the request with ShardNotFound because the shard registry is empty.

The primary replica usually can load, because it picks itself for loopback connection more often than not, and connection to itself works. The replicas are coming online with some delay making tests to be flaky with a variation of failures.

The graph of those nearest requests is somewhat sticky because it changes after next round of hellos. The test may fail before the graph becomes more favorable. Finding a new nearest takes some time and I observe a replica repeatedly getting host not found while trying a secondary.

The fix for this problem is to pre-populate the ShardRegistry from local DB, which also could be delayed because of replication.

Comment by Andrew Shuvalov (Inactive) [ 22/Jun/22 ]

This has to be reopened as the fix for SERVER-67258. I will fill up the details later.

Comment by Kaloian Manassiev [ 09/Jun/22 ]

Just to be clear, andrew.shuvalov@mongodb.com, do we need to block the port from opening until the shard registry data is loaded at all? It is not like the key manager where it is actually required for incoming requests. I guess what I am asking is, if we just remove the waiting for load from the bootstrap, what fails?

Comment by Andrew Shuvalov (Inactive) [ 09/Jun/22 ]

kaloian.manassiev@mongodb.com thanks, confirmed. In previous experiments I've checked that ShardRegistryData is actually loaded during bootstrap, however I tried your suggestion that it doesn't have to. So I inserted this code:

std::pair<ShardRegistryData, Timestamp> ShardRegistryData::createFromCatalogClient(
    OperationContext* opCtx, ShardFactory* shardFactory) {
        if (serverGlobalParams.clusterRole.is(ClusterRole::ShardServer)) {
            while (!getGlobalServiceContext()->isStartupComplete()) {
                static int count = 0;
                LOGV2_ERROR(1, "!!!!! delay createFromCatalogClient", "count"_attr = ++count);
                sleepFor(Seconds(1));
            }
            LOGV2_ERROR(1, "!!!!! done waiting createFromCatalogClient");
        }

it spins once at every server, because indeed the first read comes before the startup is complete. But then the complete is triggered and the "done" log is printed. So it will be a delay waiting for port open, but it will work.

I'm resolving the ticket.

Comment by Kaloian Manassiev [ 09/Jun/22 ]

andrew.shuvalov@mongodb.com, for the ShardRegistry, do we really need to prepopulate it or should we just leave the first access to it do the fetch? Is there something before the ports are open (or at some path where we shouldn't block) which relies on it being populate?

For the KeyManager I kind of see the need, but for the ShardRegistry I can't think of one.

Generated at Thu Feb 08 06:07:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.