[DOCS-16159] Investigate changes in SERVER-72146: Make chunk migrations metrics more accessible from Atlas Created: 24/May/23  Updated: 13/Nov/23  Resolved: 12/Jul/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 6.0.6, 7.1.0-rc0, 5.0.18, 6.3.2, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Jason Price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-72146 Make chunk migrations metrics more ac... Closed
Participants:
Days since reply: 37 weeks ago

 Description   
Original Downstream Change Summary

After this commits in all LTS versions of mongo the following sharding statistics are now available through serverStatus:

  • countDocsClonedOnCatchUpOnRecipient: the number of documents cloned during the catch up phase of the migration
  • countBytesClonedOnCatchUpOnRecipient: the number of bytes cloned during the catch up phase of the migration
  • countBytesClonedOnRecipient: the number of bytes cloned by the recipient of a migration
  • countDonorMoveChunkCommitted: the total number of migrations committed by the node
  • countDonorMoveChunkAborted: the number of migrations aborted in the node
  • totalDonorMoveChunkTimeMillis: the total amount of time a migration took from beginning to end
  • totalRecipientCriticalSectionTimeMillis: the amount of time in milliseconds the recipient of a migration spent holding the critical section

    Description of Linked Ticket

    Often, when investigating HELP tickets related to balancing, we need to access and combine data from FTDC, logs and configdump to figure some basic metrics such as:

  • Migration throughput (how fast is this shard cloning data)
  • Range deleter throughout (how fast is this shard executing its range deletions)
  • Number of orphans documents (how many orphans documents are waiting to be deleted)

The following statistics should be available on serverStatus under the shardingStatistics group:

  • countDocsClonedOnCatchUpOnRecipient: the number of documents cloned during the catch up phase of the migration
  • countBytesClonedOnCatchUpOnRecipient: the number of bytes cloned during the catch up phase of the migration
  • countBytesClonedOnRecipient: the number of bytes cloned by the recipient of a migration
  • countDonorMoveChunkCommitted: the total number of migrations committed by the node
  • countDonorMoveChunkAborted: the number of migrations aborted in the node
  • totalDonorMoveChunkTimeMillis: the total amount of time a migration took from beginning to end
  • totalRecipientCriticalSectionTimeMillis: the amount of time in milliseconds the recipient of a migration spent holding the critical section


 Comments   
Comment by Julia Malkin [ 24/May/23 ]

Note to the docs owner on the server side: If this will have impact on the Atlas side, please file a clone DOCSP. Thank you!

Generated at Thu Feb 08 08:14:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.