TL;DR, in the event of a step-down of the primary node of the donor shard while the cloning phase of a movePrimary operation is in progress, the cloning procedure on recipient side is not aborted. This causes the presence of orphaned collections on the recipient and the consequent failure of any attempt to repeat the movePrimary operation (NamespaceExists error).
During the cloning phase of the movePrimary operation, the DDL coordinator calls the _shardsvrCloneCatalogData command of the recipient, which creates and fetches all unsharded collections from the donor to the recipient. In the event of a failure (step-down) during this phase, the coordinator drops the data possibly cloned on the recipient and aborts the movePrimary operation.
The bug is that the coordinator doesn't abort the data cloning procedure possibly running on the recipient. The clean up of data, possibly already cloned on the recipient, doesn't resolve the problem since the cloning procedure could be running in background.
The recipient shard could own orphaned collections which cause any attempt to repeat the movePrimary operation to fail. There is no evident business impact (data remain consistent) but a manual intervention on the recipient is required to drop these orphaned collections and then to allow a new movePrimary attempt to work.
The cloning phase of a movePrimary is heavily expensive in terms of execution times (in production it could take hours), so the cloning operation on the recipient side must not be joined but aborted. An idea is to tag the _shardsvrCloneCatalogData operation and to kill it (using the tag) when the movePrimary operation is recovered by the coordinator (before to clean any cloned data).