[SERVER-75333] The ChunkManagerTargeter unnecessarily invalidates just-refreshed routing info upon receiving UNSHARDED version Created: 27/Mar/23  Updated: 28/Mar/23  Resolved: 28/Mar/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.24, 4.4.19
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-58271 Stop forcing collection version refre... Closed
Operating System: ALL
Sprint: Sharding EMEA 2023-04-03
Participants:
Case:

 Description   

After having received a StaleConfig error with an UNSHARDED version, the ChunkManagerTargeter will use this condition in order to decide whether what it has locally is older than what the shard has or not. In the case where the shard has returned UNSHARDED, such as is the case after an election of a new cold primary, this comparison cannot be made reliably since UNSHARDED is not comparable with any other version, so the ChunkManagerTargeter invalidates the cache and loops around.

In the case of an election of a new shard primary, there could be potentially hundreds of such StaleConfig errors, and this results in a refresh convoy on the router which renders it degraded for up to num_blocked_threads X refresh_latency.

This problem was fixed in 5.0 and later under SERVER-58271, but that fix depends on infrastructure built in these versions, so this ticket was filed to track the 4.2 and 4.4 fixes which are not a backport.

Temporary workarounds:

If upgrade to 5.0 is not possible, the next best temporary workarounds are:

  1. Switch to readConcern:local and readPreference:secondaryPreferred, which will ensure with high probability that the secondary nodes will be warm when promoted to a primary
  2. (4.4 only) Use mirrored reads
  3. Pre-warm the filtering info on the SECONDARY node which will get promoted to a PRIMARY using a made-up find command, issued directly against the shard, at 'local' read concern. Running this command will potentially block for a long time and eventually will fail, but as a side effect, the cache on the secondary node will be warmed-up:

    use DBName;
    db.getMongo().setReadPref("secondaryPreferred")
    db.runCommand({find: 'CollectionName', filter: {}, limit: 1, readConcern: { level: "local" }, shardVersion: [Timestamp(1, 1), ObjectId()]})
    



 Comments   
Comment by Kaloian Manassiev [ 28/Mar/23 ]

After studying the state of the BatchWriteExecutor in v4.2 and the code changes for SERVER-58217, we have concluded that it is too risky to attempt a different kind of fix. Backporting SERVER-58217 is also not an option, because it would imply that a large part of the CatalogCache rewrite, which was done under PM-1645 would need to be backported as well.

I am closing this ticket with the understanding that there is a temporary workaround, which is explained in the description.

Generated at Thu Feb 08 06:29:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.