Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87206

Resharding adds a draining shard as recipient on 5.0.24

    • Type: Icon: Task Task
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • CAR Team 2024-03-04, CAR Team 2024-03-18, CAR Team 2024-04-01, CAR Team 2024-04-15

      After SERVER-65666, resharding shouldn't add draining shards as recipients but in my run on 5.0.24, resharding added the draining shard as a recipient. 

      Steps to reproduce: 

      • Two shard cluster in Atlas - shard0 and shard1. I can share the logs if you'd like. Lmk. 
      • Shard1 is the primary shard of testDB
      • test1TBCollection has an equal distribution of chunks on both shards 
      • Remove shard using Atlas UI
      • Confirm shard1 is not visible in Atlas cluster builder i.e. it is draining 
      • Confirm that chunks are being moved to shard0 using sh.status()
      • Run resharding
        db.adminCommand({ reshardCollection: "testDB.test1TBCollection", key: {_id: 1}})
      • Monitor resharding to confirm recipients and donors

      Monitoring output lists both shards as donor and recipient: 

      Atlas [mongos] testDB> db.getSiblingDB("admin").aggregate([ { $currentOp: { allUsers: true, localOps: false } }, { $match: { type: "op", "originatingCommand.reshardCollection": "testDB.test1TBCollection" } }] )[  {    shard: 'atlas-10zagv-shard-0',    type: 'op',    desc: 'ReshardingRecipientService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('19'),    remainingOperationTimeEstimatedSecs: Long('2'),    approxDocumentsToCopy: Long('56283'),    documentsCopied: Long('100000'),    approxBytesToCopy: Long('65119332'),    bytesCopied: Long('115699281'),    totalCopyTimeElapsedSecs: Long('18'),    oplogEntriesFetched: Long('38'),    oplogEntriesApplied: Long('0'),    totalApplyTimeElapsedSecs: Long('0'),    recipientState: 'cloning',    opStatus: 'running',    oplogApplierApplyBatchLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('0') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('0')    },    collClonerFillBatchForInsertLatencyMillis: {      '(-inf, 10)': { count: Long('1115') },      '[10, 100)': { count: Long('3') },      '[100, 1000)': { count: Long('7') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('1125')    }  },  {    shard: 'atlas-10zagv-shard-0',    type: 'op',    desc: 'ReshardingDonorService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('19'),    countWritesDuringCriticalSection: Long('0'),    totalCriticalSectionTimeElapsedSecs: Long('0'),    donorState: 'donating-initial-data',    opStatus: 'running'  },  {    shard: 'atlas-10zagv-shard-1',    type: 'op',    desc: 'ReshardingRecipientService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('18'),    remainingOperationTimeEstimatedSecs: Long('-1'),    approxDocumentsToCopy: Long('56283'),    documentsCopied: Long('0'),    approxBytesToCopy: Long('65119332'),    bytesCopied: Long('0'),    totalCopyTimeElapsedSecs: Long('18'),    oplogEntriesFetched: Long('38'),    oplogEntriesApplied: Long('0'),    totalApplyTimeElapsedSecs: Long('0'),    recipientState: 'cloning',    opStatus: 'running',    oplogApplierApplyBatchLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('0') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('0')    },    collClonerFillBatchForInsertLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('1') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('1')    }  },  {    shard: 'atlas-10zagv-shard-1',    type: 'op',    desc: 'ReshardingDonorService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('18'),    countWritesDuringCriticalSection: Long('0'),    totalCriticalSectionTimeElapsedSecs: Long('0'),    donorState: 'donating-initial-data',    opStatus: 'running'  }] 

        1. server-87206.js
          2 kB
          Antonio Fuschetto

            Assignee:
            antonio.fuschetto@mongodb.com Antonio Fuschetto
            Reporter:
            ratika.gandhi@mongodb.com Ratika Gandhi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: