Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-81229

Move primary may not cleanup cloned collections on failure

    • Sharding EMEA
    • Fully Compatible
    • ALL
    • v7.2, v7.0
    • Sharding EMEA 2023-10-16, Sharding EMEA 2023-10-30, CAR Team 2023-11-13, CAR Team 2023-11-27
    • 20
    • 3

      TL;DR, in the event of a step-down of the primary node of the donor shard while the cloning phase of a movePrimary operation is in progress, the cloning procedure on recipient side is not aborted. This causes the presence of orphaned collections on the recipient and the consequent failure of any attempt to repeat the movePrimary operation (NamespaceExists error).

      Technical details

      During the cloning phase of the movePrimary operation, the DDL coordinator calls the _shardsvrCloneCatalogData command of the recipient, which creates and fetches all unsharded collections from the donor to the recipient. In the event of a failure (step-down) during this phase, the coordinator drops the data possibly cloned on the recipient and aborts the movePrimary operation.

      The bug is that the coordinator doesn't abort the data cloning procedure possibly running on the recipient. The clean up of data, possibly already cloned on the recipient, doesn't resolve the problem since the cloning procedure could be running in background.

      User impacts

      The recipient shard could own orphaned collections which cause any attempt to repeat the movePrimary operation to fail. There is no evident business impact (data remain consistent) but a manual intervention on the recipient is required to drop these orphaned collections and then to allow a new movePrimary attempt to work.

      Potential solution

      The cloning phase of a movePrimary is heavily expensive in terms of execution times (in production it could take hours), so the cloning operation on the recipient side must not be joined but aborted. An idea is to tag the  _shardsvrCloneCatalogData operation and to kill it (using the tag) when the movePrimary operation is recovered by the coordinator (before to clean any cloned data).

            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            silvia.surroca@mongodb.com Silvia Surroca
            0 Vote for this issue
            5 Start watching this issue