Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-64730

The 'forceShardFilteringMetadataRefresh' methods don't synchronise with each other (5.0 and newer versions)

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.1.1, 5.0.14, 6.0.2, 6.2.0-rc0
    • Affects Version/s: 5.0.5, 5.1.1, 4.2.19, 5.2.1, 4.4.13, 5.3.0-rc4
    • Component/s: Sharding
    • Labels:
    • Sharding EMEA
    • Fully Compatible
    • ALL
    • v6.1, v6.0, v5.0
    • Sharding EMEA 2022-04-04, Sharding EMEA 2022-04-18, Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13, Sharding EMEA 2022-06-27, Sharding EMEA 2022-07-11, Sharding EMEA 2022-07-25, Sharding EMEA 2022-08-08, Sharding EMEA 2022-08-22, Sharding EMEA 2022-09-05, Sharding EMEA 2022-09-19, Sharding EMEA 2022-10-03, Sharding EMEA 2022-10-17, Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14, Sharding EMEA 2022-12-12

      The forceShardFilteringMetadataRefresh method is the lowest-level shard version causality utility on the shards, whose purpose is to always move the shard version forward.

      In versions 4.0 and earlier, it used to acquire collection X lock and check that the newly installed version is actually newer than the one on the CSS before installing it. Starting from version 4.2 though, as part of the transaction project it was changed to not acquire collection X-lock.

      This means that two concurrent invocations of forceShardFilteringMetadataRefresh could potentially race with each other and install non-monotonous increasing versions (i.e., the shard version on a shard can go back in time).

      After working a bit on this ticket and backporting it to previous versions. we believe it has already been addressed in 5.0 and newer versions (see the fix version to understand in which minor version the fix landed). Long story short, when a DDL operation is installing new metadata using the critical section, we cancel any ongoing onShardVersionMismatch metadata refresh, so we don't have to worry about the interleaving of these two operations. Note that any onShardVersionMismatch that arrives after the critical section is acquired will block behind it. The same happens when we clear the filtering metadata.

      The versions that still have this bug are 4.4 and 4.2. I propose to perform an investigation about these two versions and open a new ticket about how to fix it. Sending it to Needs Scheduling so we properly triage this task. SERVER-72322 will track this issue on 4.4 and 4.2 branches.

            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            0 Vote for this issue
            10 Start watching this issue