Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75333

The ChunkManagerTargeter unnecessarily invalidates just-refreshed routing info upon receiving UNSHARDED version

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.2.24, 4.4.19
    • Component/s: Sharding
    • None
    • ALL
    • Sharding EMEA 2023-04-03

      After having received a StaleConfig error with an UNSHARDED version, the ChunkManagerTargeter will use this condition in order to decide whether what it has locally is older than what the shard has or not. In the case where the shard has returned UNSHARDED, such as is the case after an election of a new cold primary, this comparison cannot be made reliably since UNSHARDED is not comparable with any other version, so the ChunkManagerTargeter invalidates the cache and loops around.

      In the case of an election of a new shard primary, there could be potentially hundreds of such StaleConfig errors, and this results in a refresh convoy on the router which renders it degraded for up to num_blocked_threads X refresh_latency.

      This problem was fixed in 5.0 and later under SERVER-58271, but that fix depends on infrastructure built in these versions, so this ticket was filed to track the 4.2 and 4.4 fixes which are not a backport.

      Temporary workarounds:

      If upgrade to 5.0 is not possible, the next best temporary workarounds are:

      1. Switch to readConcern:local and readPreference:secondaryPreferred, which will ensure with high probability that the secondary nodes will be warm when promoted to a primary
      2. (4.4 only) Use mirrored reads
      3. Pre-warm the filtering info on the SECONDARY node which will get promoted to a PRIMARY using a made-up find command, issued directly against the shard, at 'local' read concern. Running this command will potentially block for a long time and eventually will fail, but as a side effect, the cache on the secondary node will be warmed-up:
        use DBName;
        db.runCommand({find: 'CollectionName', filter: {}, limit: 1, readConcern: { level: "local" }, shardVersion: [Timestamp(1, 1), ObjectId()]})

            kaloian.manassiev@mongodb.com Kaloian Manassiev
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            0 Vote for this issue
            9 Start watching this issue