Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-71328

Ensure correct filtering metadata on donor shard after multiple failures

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.3.0-rc0, 6.0.5, 5.0.16
    • Affects Version/s: 5.0.13, 6.0.2, 6.1.0-rc4, 6.2.0-rc0
    • Component/s: None
    • Fully Compatible
    • ALL
    • v6.2, v6.0, v5.0
    • Hide

      Since the configOpTime known to P1 is not inclusive of the latest committed migration, there is no guarantee that any subsequent refresh of the filtering metadata would include the committed migration.

      Show
      Migration starts on current primary node P0 of the donor shard. Commit of the migration fail due to network error _cleanup() is executed and  as part of it endMetadataOp() will persist the latest config time. This is useless since we still don't have knowledge of the config time that is inclusive of the migration commit. Asynchronous recovery of the migration is spawned During recovery we read again from the config server and we realized that actually the commit succeed We will call completeMigration() and we will persist the migration decision in the coordinator document without calling endMetadataOp() . Stepdown will happen before removing the coordinator document. A new primary node P1 of the donor shard will be elected and it will try to recover the migration again since the coordinator document is still present. This time it will find that the migration decision have been already set as kCommitted and it will call again completeMigration() without calling endMetadataOp() Since the configOpTime known to P1 is not inclusive of the latest committed migration, there is no guarantee that any subsequent refresh of the filtering metadata would include the committed migration.
    • Sharding EMEA 2022-12-12, Sharding EMEA 2022-12-26, Sharding EMEA 2023-01-09

      The donor of a chunk migration calls ShardingStateRecovery::endMetadataOp() that is persisting the configOpTime inclusive of the migration commit, this is to ensure that in case of stepdown when the next primary node will read from the config server it will see the effect of the commit performed by the previous primary.

      The problem is that endMetadataOp() is not called after recovering a failed migration, so in case the donor experiences an error during the commit (network error) and a subsequent stepdown, there is no guarantee that the next primary node will install the correct filtering metadata inclusive of the last migration.

      The proposed solution is to add a VectorClock::waitForDurableConfigTime() just before writing down the commit decision in the migration coordinator document.
      This will be execute both if no error occur during the commit as well as during migration recovery.

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: