[SERVER-74713] Is it valid to run removeShard on your catalogShard? Created: 09/Mar/23  Updated: 30/Mar/23  Resolved: 30/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Joanna Cheng Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Done Votes: 0
Labels: skunkelodeon-odcs
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Initial state

[direct: mongos] test> sh.status()
shardingVersion
{ _id: 1, clusterId: ObjectId("6407feaf46705024b23e5f69") }
---
shards
[
  {
    _id: 'config',
    host: 'csshard/localhost:27019',
    state: 1,
    topologyTime: Timestamp({ t: 1678245624, i: 1 })
  },
  {
    _id: 'shard2',
    host: 'shard2/localhost:27021',
    state: 1,
    topologyTime: Timestamp({ t: 1678245703, i: 2 })
  }
]
---
active mongoses
[ { '7.0.0-alpha-538-g7cec1b7': 1 } ]
---
autosplit
{ 'Currently enabled': 'yes' }
---
balancer
{ 'Currently enabled': 'yes', 'Currently running': 'no' }
---
databases
[
  {
    database: { _id: 'config', primary: 'config', partitioned: true },
    collections: {
      'config.system.sessions': {
        shardKey: { _id: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'config', nChunks: 1024 } ],
        chunks: [
          'too many chunks to print, use verbose if you want to force print'
        ],
        tags: []
      }
    }
  },
  {
    database: {
      _id: 'test',
      primary: 'shard2',
      partitioned: false,
      version: {
        uuid: UUID("5ed830ec-10da-42f9-b92a-b7e3f78df969"),
        timestamp: Timestamp({ t: 1678245720, i: 1 }),
        lastMod: 1
      }
    },
    collections: {
      'test.bar': {
        shardKey: { a: 1 },
        unique: true,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'shard2', 'last modified': Timestamp({ t: 1, i: 0 }) }
        ],
        tags: []
      },
      'test.baz': {
        shardKey: { a: 1 },
        unique: true,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'shard2', 'last modified': Timestamp({ t: 1, i: 0 }) }
        ],
        tags: []
      },
      'test.shards': {
        shardKey: { a: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'config', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'config', 'last modified': Timestamp({ t: 1, i: 4 }) }
        ],
        tags: []
      }
    }
  }
]

Start to end removal

[direct: mongos] test> db.adminCommand({removeShard: "config"})
{
  msg: 'draining started successfully',
  state: 'started',
  shard: 'config',
  note: 'you need to drop or movePrimary these databases',
  dbsToMove: [],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1678337082, i: 3 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1678337082, i: 3 })
}

[direct: mongos] test> db.adminCommand({removeShard: "config"})
{
  msg: 'removeshard completed successfully',
  state: 'completed',
  shard: 'config',
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1678338347, i: 4 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1678338347, i: 4 })
}

End state:

[direct: mongos] test> sh.status()
shardingVersion
{ _id: 1, clusterId: ObjectId("6407feaf46705024b23e5f69") }
---
shards
[
  {
    _id: 'shard2',
    host: 'shard2/localhost:27021',
    state: 1,
    topologyTime: Timestamp({ t: 1678338347, i: 1 })
  }
]
---
active mongoses
[ { '7.0.0-alpha-538-g7cec1b7': 1 } ]
---
autosplit
{ 'Currently enabled': 'yes' }
---
balancer
{ 'Currently enabled': 'yes', 'Currently running': 'no' }
---
databases
[
  {
    database: { _id: 'config', primary: 'config', partitioned: true },
    collections: {
      'config.system.sessions': {
        shardKey: { _id: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1024 } ],
        chunks: [
          'too many chunks to print, use verbose if you want to force print'
        ],
        tags: []
      }
    }
  },
  {
    database: {
      _id: 'test',
      primary: 'shard2',
      partitioned: false,
      version: {
        uuid: UUID("5ed830ec-10da-42f9-b92a-b7e3f78df969"),
        timestamp: Timestamp({ t: 1678245720, i: 1 }),
        lastMod: 1
      }
    },
    collections: {
      'test.bar': {
        shardKey: { a: 1 },
        unique: true,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'shard2', 'last modified': Timestamp({ t: 1, i: 0 }) }
        ],
        tags: []
      },
      'test.baz': {
        shardKey: { a: 1 },
        unique: true,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'shard2', 'last modified': Timestamp({ t: 1, i: 0 }) }
        ],
        tags: []
      },
      'test.shards': {
        shardKey: { a: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'shard2', nChunks: 1 } ],
        chunks: [
          { min: { a: MinKey() }, max: { a: MaxKey() }, 'on shard': 'shard2', 'last modified': Timestamp({ t: 2, i: 0 }) }
        ],
        tags: []
      }
    }
  }
]

Question 1: Is it valid to do this, and stop here - and now the old config shard can just function as a CSRS?

Trying to readd the catalog shard gives an error

[direct: mongos] test> db.adminCommand({ transitionToCatalogShard: 1 });
MongoServerError: can't add shard 'csshard/localhost:27019' because a local database 'test' exists in another shard2

However, dropping that database on csshard (it's empty anyway, all the data was moved) and then re-running is fine

[direct: mongos] test> db.adminCommand({ transitionToCatalogShard: 1 });
{
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1678339266, i: 4 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1678339266, i: 3 })
}

Question 2: Is this sequence of events valid to get back to the original state, or is there something else I should be aware of?



 Comments   
Comment by Jack Mulrow [ 09/Mar/23 ]

Question 1: Is it valid to do this, and stop here - and now the old config shard can just function as a CSRS

We have a wrapper command, transitionToDedicatedConfigServer, that users are supposed to use, but internally it just uses removeShard, so effectively, yes this should work. We disallow running addShard with the "config" name but haven't yet disallowed removeShard with the "config" name, which will hopefully prevent some confusion for users when we release this.

Question 2: Is this sequence of events valid to get back to the original state, or is there something else I should be aware of?

Yes, running transitionToCatalogShard should be valid to get back to the original state. Conceptually it's like removing then adding a shard again, we just used wrapper commands to (hopefully) improve the UX and give us a place for any "config" shard specific logic, e.g. we're planning on automatically dropping drained sharded collections from the config server during transitionToDedicatedConfigServer so users don't hit that "can't add shard because a local database exists on another shard" error you hit.

Generated at Thu Feb 08 06:28:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.