Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92847

The filtering metadata can be initiated as corrupted during initial sync

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.0
    • CAR Team 2024-08-05
    • 200

      As explained in the final report on BF-34108, after SERVER-84754  the initialisation of the CollectionShardingRuntime has been splitted between 2 cases:

      • ShardingState not initialised: in that case we assume the node is part of a standalone replica set. Every collection will be UNTRACKED.
      • ShardingState initialised: In that case, the node it’s part of cluster. Every entry in the filtering metadata must be either UNKNOWN or correct (where correct means untracked or tracked according to whether the collection is actually tracked by the sharding catalog).

      A fresh new node connected to the replica set could have its metadata corrupted (i.e untracked when the collection is tracked) until a refresh occurs.

      This can happen because a file-copy based initial sync starts before the ShardingState is enabled ( which will happen at the completion of the initial sync) causing the collection to be created for the first time and access its filtering metadata, causing to initiate them.

      Once the node is elected to secondary, the filtering metadata will stay corrupted until the next refresh.

      The sharding protocol does not expect such a case scenario, which might end up in queries returning the wrong result.

      Possible solutions:

      This requires a fix. My personal suggestion would be to force a refresh once we enable the  ShardingState to fix eventual inconsistencies. Other solutions are welcomed.

            Assignee:
            enrico.golfieri@mongodb.com Enrico Golfieri
            Reporter:
            enrico.golfieri@mongodb.com Enrico Golfieri
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: