Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77768

Prevent DDL ops and migrations from failing transitionFromDedicatedConfigServer

    • Type: Icon: Task Task
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Cluster Scalability

      The transitionToDedicatedConfigServer command essentially wraps removeShard and once user data has been moved from the config server will remove the config server's shard document from config.shards. To enable transitioning from a dedicated config server back to a config shard, the transitionFromDedicatedConfigServer command adds an entry back to config.shards, essentially wrapping addShard.

      If a collection that exists locally on the config server conflicts with an existing namespace in the cluster, addShard and therefore transitionFromDedicatedConfigServer will fail, requiring the user to resolve the collision. To allow successive transitions, transitionToDedicatedConfigServer will locally drop sharded collections that have been drained of their chunks, after all range deletion tasks have run.

      Chunk migrations check if the recipient shard is draining only when committing, so the balancer may choose to move a chunk to the config shard, but it may successfully be removed before the migration completes. The migration will correctly fail, but it may leave orphaned data on the config server, which prevents a future transitionFromDedicatedConfigServer from succeeding without user intervention. There is a similar problem with renameCollection.

      This is unlikely in practice, because the balancer won't move a chunk to a draining shard, so the config shard must have no chunks when the transition to dedicated mode begins, and the removeShard waits for all local range deletion documents to be removed, so a migration started after the config shard starts to drain would have to take longer than the default orphan cleanup delay of 15 minutes to insert its range deletion task on the config server.

      The purpose of this ticket is to guarantee transitionFromDedicatedConfigServer can always succeed. Some possible approaches:

      1. Serialize the transitionToDedicatedConfigServer with chunk migrations and DDL ops.
      2. Allow transitionFromDedicatedConfigServer to locally drop any local duplicate namespaces, on the assumption the config server shouldn't have genuine user data.

            backlog-server-cluster-scalability [DO NOT USE] Backlog - Cluster Scalability
            jack.mulrow@mongodb.com Jack Mulrow
            0 Vote for this issue
            7 Start watching this issue