Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Sprint:
CAR Team 2024-03-04, CAR Team 2024-03-18, CAR Team 2024-04-01, CAR Team 2024-04-15
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

After ~~SERVER-65666~~, resharding shouldn't add draining shards as recipients but in my run on 5.0.24, resharding added the draining shard as a recipient.

Steps to reproduce:

Two shard cluster in Atlas - shard0 and shard1. I can share the logs if you'd like. Lmk.
Shard1 is the primary shard of testDB
test1TBCollection has an equal distribution of chunks on both shards
Remove shard using Atlas UI
Confirm shard1 is not visible in Atlas cluster builder i.e. it is draining
Confirm that chunks are being moved to shard0 using sh.status()
Run resharding
db.adminCommand({ reshardCollection: "testDB.test1TBCollection", key: {_id: 1}})

Monitor resharding to confirm recipients and donors

Monitoring output lists both shards as donor and recipient:

Atlas [mongos] testDB> db.getSiblingDB("admin").aggregate([ { $currentOp: { allUsers: true, localOps: false } }, { $match: { type: "op", "originatingCommand.reshardCollection": "testDB.test1TBCollection" } }] )[  {    shard: 'atlas-10zagv-shard-0',    type: 'op',    desc: 'ReshardingRecipientService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('19'),    remainingOperationTimeEstimatedSecs: Long('2'),    approxDocumentsToCopy: Long('56283'),    documentsCopied: Long('100000'),    approxBytesToCopy: Long('65119332'),    bytesCopied: Long('115699281'),    totalCopyTimeElapsedSecs: Long('18'),    oplogEntriesFetched: Long('38'),    oplogEntriesApplied: Long('0'),    totalApplyTimeElapsedSecs: Long('0'),    recipientState: 'cloning',    opStatus: 'running',    oplogApplierApplyBatchLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('0') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('0')    },    collClonerFillBatchForInsertLatencyMillis: {      '(-inf, 10)': { count: Long('1115') },      '[10, 100)': { count: Long('3') },      '[100, 1000)': { count: Long('7') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('1125')    }  },  {    shard: 'atlas-10zagv-shard-0',    type: 'op',    desc: 'ReshardingDonorService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('19'),    countWritesDuringCriticalSection: Long('0'),    totalCriticalSectionTimeElapsedSecs: Long('0'),    donorState: 'donating-initial-data',    opStatus: 'running'  },  {    shard: 'atlas-10zagv-shard-1',    type: 'op',    desc: 'ReshardingRecipientService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('18'),    remainingOperationTimeEstimatedSecs: Long('-1'),    approxDocumentsToCopy: Long('56283'),    documentsCopied: Long('0'),    approxBytesToCopy: Long('65119332'),    bytesCopied: Long('0'),    totalCopyTimeElapsedSecs: Long('18'),    oplogEntriesFetched: Long('38'),    oplogEntriesApplied: Long('0'),    totalApplyTimeElapsedSecs: Long('0'),    recipientState: 'cloning',    opStatus: 'running',    oplogApplierApplyBatchLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('0') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('0')    },    collClonerFillBatchForInsertLatencyMillis: {      '(-inf, 10)': { count: Long('0') },      '[10, 100)': { count: Long('0') },      '[100, 1000)': { count: Long('1') },      '[1000, 10000)': { count: Long('0') },      '[10000, inf)': { count: Long('0') },      totalCount: Long('1')    }  },  {    shard: 'atlas-10zagv-shard-1',    type: 'op',    desc: 'ReshardingDonorService bc0e3d6c-7f52-4b5e-9ab3-447444980604',    op: 'command',    ns: 'testDB.test1TBCollection',    originatingCommand: {      reshardCollection: 'testDB.test1TBCollection',      key: { _id: 1 },      unique: false,      collation: { locale: 'simple' }    },    totalOperationTimeElapsedSecs: Long('18'),    countWritesDuringCriticalSection: Long('0'),    totalCriticalSectionTimeElapsedSecs: Long('0'),    donorState: 'donating-initial-data',    opStatus: 'running'  }]

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

server-87206.js
2 kB
Mar 20 2024 05:09:17 PM UTC

is related to

SERVER-65666 Do not create chunks on draining shards when sharding a new collection

Closed

Assignee:: Antonio Fuschetto
Reporter:: Ratika Gandhi
Participants:: Antonio Fuschetto, Ratika Gandhi
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Feb 28 2024 05:33:39 PM UTC
Updated:: Apr 10 2024 06:25:04 PM UTC
Resolved:: Mar 29 2024 05:13:38 PM UTC
Confidence Status Last Update:: 20/Mar/24 10:19 AM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates